Nearly 10 years ago, I was at a Spotify recruiting event and they told us how they did embeddings at the time.
They took all user generated playlists and projected the songs into vectors where songs that appear together on playlists are closer and songs that appear less often are farther.
It’s likely changed a lot since then, but it seemed like a pretty straightforward clustering system at the time.
co-occurrence. It's the real backbone of almost all recommender systems.
This is the same way YT/TikTok does it btw. Co-occurrence is king in recommender systems in production. It's extremely cheap to calculate and by far the most effective method.
That's just bais collaborative filtering. Drdaeman is talking about using the actual content of the songs in your vector embeddings.
This is not really important if you have a lot of user behavior data and/or playlists for each song. But if you have a niche song that few people of listened to, collaborative filtering based recommendations aren't going to be good.
Real semantic embeddings (which can then be part of the input to the recommendation model) can be trained using self-supervision, e.g. an auto encoder or a seperate "next audio token" predicting transformer.
A recommendation from a person you know takes into account not just their knowledge of your preferences, but also how much and in what way they like/care about you, and conversely, your taking of the recommendation is colored by your rapport with the recommender. All that is something a recommender system has no access to.
Or, more bluntly: you aren't going to mate with a For You page, so it doesn't have the same evolutionary cheat code to your preferences as other people have.
Complicated, or worryingly straightforward and effective? It really does seem that over time, this would compress the space of peoples' preferences - and since listening stats also feed into production and promotion - the space of music produced.
They took all user generated playlists and projected the songs into vectors where songs that appear together on playlists are closer and songs that appear less often are farther.
It’s likely changed a lot since then, but it seemed like a pretty straightforward clustering system at the time.