Hi all! Coded this up over the holidays, and curious to hear thoughts & suggestions!
In order to find the "next word" w to say, the program uses a normalized set of word vectors and tries to maximize the product of the cosine similarities between w and each of the two context words. (The idea to maximize this product, rather than finding the word closest to the mean of the context words, comes from the paper Linguistic Regularities in Sparse and Explicit Word Representations [1].) Possible answers are then filtered based on some simple heuristics (avoid saying a word that's been said before, that's too infrequent or too frequent, etc.).
It's come up with some fun answers — van + agree --> Accord, facebook + sore --> viral, cursive + movie --> script — but can also sometimes be frustrating. Any thoughts on how to make it more "human-like" would be appreciated :)
In order to find the "next word" w to say, the program uses a normalized set of word vectors and tries to maximize the product of the cosine similarities between w and each of the two context words. (The idea to maximize this product, rather than finding the word closest to the mean of the context words, comes from the paper Linguistic Regularities in Sparse and Explicit Word Representations [1].) Possible answers are then filtered based on some simple heuristics (avoid saying a word that's been said before, that's too infrequent or too frequent, etc.).
It's come up with some fun answers — van + agree --> Accord, facebook + sore --> viral, cursive + movie --> script — but can also sometimes be frustrating. Any thoughts on how to make it more "human-like" would be appreciated :)
There's also a discussion on Reddit [2].
[1] http://www.aclweb.org/anthology/W14-1618
[2] https://www.reddit.com/r/MachineLearning/comments/7o8q8v/p_c...