Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: QuoDB – Movie quote search engine based on subtitles (quodb.com)
96 points by lusob on Aug 4, 2014 | hide | past | favorite | 30 comments



Ah, some search relevance needs to be worked on :) No movie should rank higher than the Terminators for "I'll be back":

http://www.quodb.com/#search/i'll%20be%20back


I typed in a few ARNIE quotes:

  Terminator:
    "nothing clean right" - no results
    "fuck you, asshole" - one result for Terminator, but the phrase occurs twice in the film.
  Predator:
    "If it bleeds, we can kill it" - Predator came up, and a few others (interesting).
  Total Recall:
    "Sue me, dickhead" - it got that one.
  Commando:
    "You're a funny guy Sully, I like you. That's why I'm going to kill you last." - no results.
I'm probably gonna spend all night on this.

EDIT: reformatted.


"asta la vista baby" query does not result with Terminator at all!


For that you need auto correction; "hasta la vista baby" gives the correct result


I think lots of us have had this idea, great to see it implemented.


I've often wondered how a database such as this can be used in other fields of programming like say, a text-to-speech engine[1] where using subtitles the algorithm can guess the context of the conversation to produce better results.

[1] http://www.slate.com/articles/technology/technology/2009/03/...


I actually worked on this exact problem as an intern job at our university. We used a huge corpus of communication (for example, we had access to all the emails every sent internally at Enron).

We used this as the basis to train a speech-to-text engine by automatically correcting likely-wrong interpretations. "I go loo school" would be corrected to "I go to school", for example. It worked remarkably well.

The basis of all these subtitles can be used, but there are far bigger (and better?) collections of data to be used to train these machine learning engines.


Could you recommend any of these data collections if they are open to the public?


This is very likely the Enron corpus that was used: https://www.cs.cmu.edu/~./enron/


I can confirm that this is the corpus. I can also confirm that, even though the emails are all from mid-to-senior management, the writing style is very sloppy.


The first thing I searched for was "you look like shit" which is a very common remark in movies and shows.

http://www.quodb.com/#search/you%20look%20like%20shit

531 titles. Wow.


I'm partial to "We've got company!"

http://www.quodb.com/#search/we've%20got%20company

Incoming is the moral equivalent (and is much more popular), but is less impressive since it's only one word.

http://www.quodb.com/#search/incoming!



Where are the movie cover thumbnail images from? I'd like a source for an idea I have.


Suddenly Fight Club is in the same league as Ugly Betty...

http://www.quodb.com/#search/i%20want%20you%20to%20hit%20me%...


Very neat. I queried for a line I remembered as "a story Englishmen tell when they're down in the mouth", and it corrected this "Englishmen tell it [etc.]", identifying the movie as Beat the Devil.


I typed in "finality" as the search term. There's a scene in which this word is used where Nick Nolte gives a speech to the Hulk. It only came up with results that had "finaLLy" in the results(?)


Nice design; fast and functional too. Kudos!

So while the fuzzy matching is neat, sometimes it's handy to be able to perform an exact search as well.

Typically this is done using "quotation marks" around the search term(s).


Interesting case: i searched "screw you" and it als turned up results like "are you screwing with me?" , "we would have been screwed..." etc


Very cool!

What is it using localStorage for? Without dom.storage.enabled, it's just a blank white page with a footer.


Is this legal?


As someone who often hunts old movies for samples of random phrases, this is just so perfect. Thanks.


Excellent. One useful feature would be the ability to sort search results by age, for example.


congrats ! do you plan to include subtitles from another languages ?


Waterboy: "this is some high quality h2o" yields nothing.



This is halfway trough what I wanted to do. Just add the movie's clip together with the quote and, boom, gold.


One developer did something like that. She'd use subtitiles to create .gif's of movie quotes. made a utility for it! will post if i can find the link


http://quotacle.com/

Also posted to HN not too long ago


That should be easy, uploading every movie you can find.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: