Hacker News new | past | comments | ask | show | jobs | submit login

To oversimplify, I think the training set is something like:

Italian restaurant is good.

Chinese restaurant is good.

Chinese government is bad.

Mexican restaurant is good.

Mexican drug dealers are bad.

Mexican illegal immigrants are bad.

And hence the word vector works as expected and the sentiment result follows.

Update:

To confirm my suspicion, I tried out an online demo to check distance between words in a trained word embedding model using word2vec:

http://bionlp-www.utu.fi/wv_demo/

Here is an example output I got with Finnish 4B model (probably a bad choice since it is not English):

italian, bad: 0.18492977

chinese, bad: 0.5144626

mexican, bad: 0.3288326

Same pairs with Google News model:

italian, bad: 0.09307841

chinese, bad: 0.19638279

mexican, bad: 0.16298543




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: