Hacker News new | past | comments | ask | show | jobs | submit login
Item2Vec: Neural Item Embedding for Collaborative Filtering (arxiv.org)
103 points by ukz on July 10, 2016 | hide | past | favorite | 37 comments



I don't get the innovation in this paper - are they just running word2vec on groups of items? If so, Spotify has been doing this on playlists for years now: https://erikbern.com/2013/11/02/model-benchmarks/

Also, I know the paper isn't claiming state-of-the-art, but their SVD results are horrendous. Standard CF would create much better artist-artist pairings with even a medium sized dataset.

As an aside, I've run some quantitative and qualitative tests and have found the best recommendations come from a combination of user-item and item-item. I co-gave a talk at the NYC machine learning meetup recently (https://docs.google.com/presentation/d/1S5Cizi9LFQ7l0bMYtY7g...) that shows how this can work, starting at slide 20. The idea is to create a candidate list of matches using item-item, and then reorder using item-user. I've found this creates "sensible" suggestions using item-item, but truly personalizes when re-ordering. You can remove obvious recommendations by removing popular matches or matches the user has already interacted with (I consider this a business decision rather than something inherent in the algorithm).


Spotify got this from Berkeley Lab who were doing it in 2005 "Word2Vec is based on an approach from Lawrence Berkeley National Lab" https://www.kaggle.com/c/word2vec-nlp-tutorial/forums/t/1234... which is interesting because the original streaming music site, seeqpod, who powered spotify, was based on vectors for songs, like a song2vec.


From the Spotify blog post: "We train a model on subsampled (5%) playlist data using skip-grams and 40 factors."

Any idea what those 40 factors might be?

(The item2vec paper describes using pairs of items that occur in the same set, i.e. just like using n-grams, but without a fixed n, and ignoring ordering.)


That's the dimensionality of the resulting word vectors in word2vec; in the item2vec paper this is the "dimension parameter m".


Yeah, I "invented" this in 2011 or 2012 and it was one of the ideas behind the company that I sold. At the time I thought it was a clever hack, but I didn't see it as especially non-obvious.


hi,very informative talk; especially with those examples for handling cold start and seeding. any pointers on how the multiple entities are incorporated in the interaction matrix? I understand how user/item attributes may be incorporated in the interaction matrix but multiple entities is something that I am struggling to understand. Pointers to associated literature would help too.


This paper covers mixing different types: http://www.csie.ntu.edu.tw/~b97053/paper/Rendle2010FM.pdf (this paper covers a related but different technique). See figure 1 for an example of mixing ratings, indicator variables, and time into a single matrix.


Thanks a lot :-)


"[before computing SVD], we normalized each entry according to the square root of the product of its row and column sums."

Why didn't they use something that usually works better, like PMI?


This is a normalization that I have used and seen other people use. I don't think it's a foregone conclusion that PMI is better for every task.


Fascinating.

The qualitative comparison suggests that the item2vec may produce _more_ homogenous / boring results, which is kinda unfortunate; the interesting question in recommendations is how to find "aspirational" recommendations (things the shopper would not have looked for on their own).

I would really love to see an analysis that did an A/B test using more traditional CF and this, and see what the revenue lift was, because "accuracy" as measured here doesn't necessarily map onto the objective that you care about in the real world.

On the other hand, I played with using collaborative filtering to improve the personalization of language models for speech recognition for shopping, and in that context this approach sounds like it might have been super useful, because it was actually fairly challenging to get broad enough coverage of the full set of items from a small number of purchases for the purposes of language modeling. Having good embeddings would have helped a lot.


"I would really love to see an analysis that did an A/B test using more traditional CF and this, and see what the revenue lift was, because "accuracy" as measured here doesn't necessarily map onto the objective that you care about in the real world."

For another approach to product recommendation with some lift info, try https://research.googleblog.com/2016/06/wide-deep-learning-b... http://arxiv.org/abs/1606.07792


Humans in the loop is great for this (kinda why mechanical turk was invented in the first place). I like echen's blog here http://blog.echen.me/2014/10/07/moving-beyond-ctr-better-rec...


It may be an urban myth, but somebody told me Amazon tweaked their recommendation algorithm to occasionally provide random items, the thinking being that people might be persuaded to buy something on the mere suggestion that they would like it.


A multi-armed bandit will occasionally provide 'random' items as part of the exploration phase. Perhaps that's what's going on, and not any sort of diabolical self-fulfilling prophecy.


Thorsten Joachims gave a talk at Amazon Machine Learning Conference 2015, about doing specifically that. That may be what someone was talking about. I've been trying to find the paper related to the work, but am struggling to find it.


I wonder if the item vectors capture semantics and behave in a way analogous to word vectors. So, for example, would a PS4 - a PS4 controller = an XBox - an XBox controller, the same way France - Paris = Greece - Athens? Something along these lines could maybe be used as a way to find relevant addons/upsells to show on the checkout page.


They do. In my current research I've been working on metric embeddings to solve the question analogies of the flavor "Favorite Sushi Restaurant:Current City::???:Foreign City". It takes some work to remove the geographic signal that is overwhelmingly present in fan and checkin data.


I attended a talk by one of the item2vec authors in ICML. He showed few examples of semantic relations, for example david guetta - beyonce = avicii - rihanna They also gave a link to a really cool 2D TSNE of item2vec on artists data. Too bad they did not include it in their paper. I guess similar types of semantic relations exist in item2vec representation for products but such relations do not appear in the paper.


Does anyone know good resources/research about generating latent vector representations with iterative processes using numerical analysis algorithms and not neural networks?

The black-box effect on word2vec and similars puts back some applications like generalizing linguistics methods to bioinformatics.


hmmh... I don't believe word2vec or item2vec would be considered neural network algorithms.

you come up with a model where a numerical vector represents the attributes of the word or item, you predict the likelihood of a match between words/items by multiplying vectors together, and then you use numerical optimization, i.e. an iterative gradient descent algorithm starting from randomly initialized vectors, to estimate the vectors that work best.


They're NNs because you learn the representation using RNNs. Everything afterwards is trivial since you're in a hilbert space. But getting the representations is the hard part.


word2vec does not use RNNs, the network is trained on a simple classification task "neighborhood" -> "word". Each word in the corpus is an independent example, there's no sequential dependence.


Word2vec doesn't use RNN. It has only one softmax layer after embedding.


oh, ok. Do you have to use RNNs? I think I've done them without RNNs.

Would love a good RNN word2vec type example with Tensorflow if anyone knows one.


Or you could use a pre-trained list like the ones from Google [1]. If not you probably solved an open problem in the area and publishing it would help us not to lose time trying to solve it again.

[1] - https://code.google.com/archive/p/word2vec/

Edit: word2vec on tensorflow tutorial https://www.tensorflow.org/versions/r0.7/tutorials/word2vec/...


Yeah, I implemented something based on the code from the Udacity course that Googlers (Vincent Vanhoucke) did on Tensorflow, basically same I think

their version https://github.com/tensorflow/tensorflow/blob/master/tensorf...

my version https://github.com/druce/streeteye_word2vec/blob/master/word...



That's standard word2vec, not an RNN.


Iterated Least Squares? https://en.wikipedia.org/wiki/Iteratively_reweighted_least_s...

Unless I misunderstood the question...


GloVe might be what you're looking for: http://nlp.stanford.edu/projects/glove/



"generating latent vector representations with iterative processes using numerical analysis algorithms"

Sounds like word2vec.


This is a great model. I applied it to online retailer data and movies and it works amazingly well! much better than SVD++ or SVD. I have found it to perform very well on items with low usage too. I took the authors advice to change the window size dynamically according to the set size.


Github! This should be on github






Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: