I've been thinking about something like this for a while. My idea was to use data from IMDB to create a graph where the distance between movies is some ranking of similarity based on the people involved in the movie, the genre, the setting and any other information you could get from the data set.
You could say, "I want to watch a movie like The Wolf of Wall Street" and it would find the closest 10 movies in the graph.
It's still something I'd like to play with if I find the time.
I started on something similar as a side project[1]. I decided to build a dataset from the AFI Top 100 Films list and persist it in neo4j. The goal was to find interesting questions to answer with this dataset that couldn't easily be googled.
Most of my time thus far has been spent gathering the dataset, but I do have a few example cypher queries answering the following simple questions [2]:
1) What actors have appeared in the most AFI Top 100 films?
2) What are the genres of the top ten films?
3) Have any actors appeared in 2 or more of the top 25 films?
I'm working on building a much larger data set using a combination of freebase and imdb so that I can have enough data to start exploring much more interesting interesting questions (e.g. graph the frequencies of genres over the past 60 years; for a given film, find movies with the greatest overlap in genres, actors, and directors; generalize the n-degrees-to-bacon problem to work on any two actors; etc).
Very cool. Thanks for posting this. The n-degrees-to-bacon problem is actual what made me think of this in the first place. It would be great to be able to plug in two actors and have it spit out an answer with the shortest path.
I've contemplated the idea of taking soundtracks from movies and comparing them to a user's favorite tracks.
It wouldn't provide accurate prediction of the best pick movie to watch, but it might come up with an indirect, quality pick that might otherwise never been seen.
That's awesome! What is your algorithm like for determining the similarity? There are some good answers, but The Wolf of Wall Street is apparently pretty close to Frozen. ;)
"Films similar to X" is definitly a good use case. But sometimes you can just go to the Amazon page for a DVD, and look at "people who bought this also bought that" ;)
You could say, "I want to watch a movie like The Wolf of Wall Street" and it would find the closest 10 movies in the graph.
It's still something I'd like to play with if I find the time.