Item2Vec: Neural Item Embedding for Collaborative Filtering

rm999 · on July 11, 2016

I don't get the innovation in this paper - are they just running word2vec on groups of items? If so, Spotify has been doing this on playlists for years now: https://erikbern.com/2013/11/02/model-benchmarks/

Also, I know the paper isn't claiming state-of-the-art, but their SVD results are horrendous. Standard CF would create much better artist-artist pairings with even a medium sized dataset.

As an aside, I've run some quantitative and qualitative tests and have found the best recommendations come from a combination of user-item and item-item. I co-gave a talk at the NYC machine learning meetup recently (https://docs.google.com/presentation/d/1S5Cizi9LFQ7l0bMYtY7g...) that shows how this can work, starting at slide 20. The idea is to create a candidate list of matches using item-item, and then reorder using item-user. I've found this creates "sensible" suggestions using item-item, but truly personalizes when re-ordering. You can remove obvious recommendations by removing popular matches or matches the user has already interacted with (I consider this a business decision rather than something inherent in the algorithm).

meeper16 · on July 11, 2016

Spotify got this from Berkeley Lab who were doing it in 2005 "Word2Vec is based on an approach from Lawrence Berkeley National Lab" https://www.kaggle.com/c/word2vec-nlp-tutorial/forums/t/1234... which is interesting because the original streaming music site, seeqpod, who powered spotify, was based on vectors for songs, like a song2vec.

rahimnathwani · on July 11, 2016

From the Spotify blog post: "We train a model on subsampled (5%) playlist data using skip-grams and 40 factors."

Any idea what those 40 factors might be?

(The item2vec paper describes using pairs of items that occur in the same set, i.e. just like using n-grams, but without a fixed n, and ignoring ordering.)

rm999 · on July 11, 2016

That's the dimensionality of the resulting word vectors in word2vec; in the item2vec paper this is the "dimension parameter m".

3pt14159 · on July 11, 2016

Yeah, I "invented" this in 2011 or 2012 and it was one of the ideas behind the company that I sold. At the time I thought it was a clever hack, but I didn't see it as especially non-obvious.

neeraj1987 · on July 14, 2016

hi,very informative talk; especially with those examples for handling cold start and seeding. any pointers on how the multiple entities are incorporated in the interaction matrix? I understand how user/item attributes may be incorporated in the interaction matrix but multiple entities is something that I am struggling to understand. Pointers to associated literature would help too.

rm999 · on July 14, 2016

This paper covers mixing different types: http://www.csie.ntu.edu.tw/~b97053/paper/Rendle2010FM.pdf (this paper covers a related but different technique). See figure 1 for an example of mixing ratings, indicator variables, and time into a single matrix.

neeraj1987 · on July 18, 2016

Thanks a lot :-)

etangent · on July 11, 2016

"[before computing SVD], we normalized each entry according to the square root of the product of its row and column sums."

Why didn't they use something that usually works better, like PMI?

rspeer · on July 11, 2016

This is a normalization that I have used and seen other people use. I don't think it's a foregone conclusion that PMI is better for every task.

praccu · on July 11, 2016

Fascinating.

The qualitative comparison suggests that the item2vec may produce _more_ homogenous / boring results, which is kinda unfortunate; the interesting question in recommendations is how to find "aspirational" recommendations (things the shopper would not have looked for on their own).

I would really love to see an analysis that did an A/B test using more traditional CF and this, and see what the revenue lift was, because "accuracy" as measured here doesn't necessarily map onto the objective that you care about in the real world.

On the other hand, I played with using collaborative filtering to improve the personalization of language models for speech recognition for shopping, and in that context this approach sounds like it might have been super useful, because it was actually fairly challenging to get broad enough coverage of the full set of items from a small number of purchases for the purposes of language modeling. Having good embeddings would have helped a lot.

aab0 · on July 11, 2016

"I would really love to see an analysis that did an A/B test using more traditional CF and this, and see what the revenue lift was, because "accuracy" as measured here doesn't necessarily map onto the objective that you care about in the real world."

For another approach to product recommendation with some lift info, try https://research.googleblog.com/2016/06/wide-deep-learning-b... http://arxiv.org/abs/1606.07792

rcpt · on July 11, 2016

Humans in the loop is great for this (kinda why mechanical turk was invented in the first place). I like echen's blog here http://blog.echen.me/2014/10/07/moving-beyond-ctr-better-rec...

flashman · on July 11, 2016

It may be an urban myth, but somebody told me Amazon tweaked their recommendation algorithm to occasionally provide random items, the thinking being that people might be persuaded to buy something on the mere suggestion that they would like it.

aab0 · on July 11, 2016

A multi-armed bandit will occasionally provide 'random' items as part of the exploration phase. Perhaps that's what's going on, and not any sort of diabolical self-fulfilling prophecy.

praccu · on July 11, 2016

Thorsten Joachims gave a talk at Amazon Machine Learning Conference 2015, about doing specifically that. That may be what someone was talking about. I've been trying to find the paper related to the work, but am struggling to find it.

apstls · on July 11, 2016

I wonder if the item vectors capture semantics and behave in a way analogous to word vectors. So, for example, would a PS4 - a PS4 controller = an XBox - an XBox controller, the same way France - Paris = Greece - Athens? Something along these lines could maybe be used as a way to find relevant addons/upsells to show on the checkout page.

brg · on July 11, 2016

They do. In my current research I've been working on metric embeddings to solve the question analogies of the flavor "Favorite Sushi Restaurant:Current City::???:Foreign City". It takes some work to remove the geographic signal that is overwhelmingly present in fan and checkin data.

supersonic13 · on July 12, 2016

I attended a talk by one of the item2vec authors in ICML. He showed few examples of semantic relations, for example david guetta - beyonce = avicii - rihanna They also gave a link to a really cool 2D TSNE of item2vec on artists data. Too bad they did not include it in their paper. I guess similar types of semantic relations exist in item2vec representation for products but such relations do not appear in the paper.

olh · on July 11, 2016

Does anyone know good resources/research about generating latent vector representations with iterative processes using numerical analysis algorithms and not neural networks?

The black-box effect on word2vec and similars puts back some applications like generalizing linguistics methods to bioinformatics.

RockyMcNuts · on July 11, 2016

hmmh... I don't believe word2vec or item2vec would be considered neural network algorithms.

you come up with a model where a numerical vector represents the attributes of the word or item, you predict the likelihood of a match between words/items by multiplying vectors together, and then you use numerical optimization, i.e. an iterative gradient descent algorithm starting from randomly initialized vectors, to estimate the vectors that work best.

ves · on July 11, 2016

They're NNs because you learn the representation using RNNs. Everything afterwards is trivial since you're in a hilbert space. But getting the representations is the hard part.

ot · on July 11, 2016

word2vec does not use RNNs, the network is trained on a simple classification task "neighborhood" -> "word". Each word in the corpus is an independent example, there's no sequential dependence.

eva1984 · on July 11, 2016

Word2vec doesn't use RNN. It has only one softmax layer after embedding.

RockyMcNuts · on July 11, 2016

oh, ok. Do you have to use RNNs? I think I've done them without RNNs.

Would love a good RNN word2vec type example with Tensorflow if anyone knows one.

olh · on July 11, 2016

Or you could use a pre-trained list like the ones from Google [1]. If not you probably solved an open problem in the area and publishing it would help us not to lose time trying to solve it again.

[1] - https://code.google.com/archive/p/word2vec/

Edit: word2vec on tensorflow tutorial https://www.tensorflow.org/versions/r0.7/tutorials/word2vec/...

RockyMcNuts · on July 11, 2016

Yeah, I implemented something based on the code from the Udacity course that Googlers (Vincent Vanhoucke) did on Tensorflow, basically same I think

their version https://github.com/tensorflow/tensorflow/blob/master/tensorf...

my version https://github.com/druce/streeteye_word2vec/blob/master/word...

nl · on July 11, 2016

You've seen https://www.tensorflow.org/versions/r0.9/tutorials/word2vec/... ?

rspeer · on July 11, 2016

That's standard word2vec, not an RNN.

1024core · on July 11, 2016

Iterated Least Squares? https://en.wikipedia.org/wiki/Iteratively_reweighted_least_s...

Unless I misunderstood the question...

rspeer · on July 11, 2016

GloVe might be what you're looking for: http://nlp.stanford.edu/projects/glove/

tokai · on July 11, 2016

What about Random Indexing?

https://www.sics.se/~mange/papers/RI_intro.pdf

svictoroff · on July 11, 2016

"generating latent vector representations with iterative processes using numerical analysis algorithms"

Sounds like word2vec.

galaxy911 · on July 12, 2016

This is a great model. I applied it to online retailer data and movies and it works amazingly well! much better than SVD++ or SVD. I have found it to perform very well on items with low usage too. I took the authors advice to change the window size dynamically according to the set size.

karmacondon · on July 11, 2016

Github! This should be on github

akkartik · on July 11, 2016

https://tensortalk.com/posts/ISw1FSTgJiwaymJXL/item2vec-neur...

donpark · on July 11, 2016

try this one: https://github.com/cmcneil/board-yet/blob/master/model/item2...