Empirically, I have found that user action sequences are a good way to model user behavior since it can look at several different scales, and specific behaviors. Interest tracking can see what a user generally likes, and the last few actions can help the model see what the user is listening to right now. But with a full sequence, you can start to model things like what the user is listening to right now, what they've been listening to recently, what they tend to listen to at this time of day, how much of a change in genre they could enjoy, etc.