Terminator:
"nothing clean right" - no results
"fuck you, asshole" - one result for Terminator, but the phrase occurs twice in the film.
Predator:
"If it bleeds, we can kill it" - Predator came up, and a few others (interesting).
Total Recall:
"Sue me, dickhead" - it got that one.
Commando:
"You're a funny guy Sully, I like you. That's why I'm going to kill you last." - no results.
I've often wondered how a database such as this can be used in other fields of programming like say, a text-to-speech engine[1] where using subtitles the algorithm can guess the context of the conversation to produce better results.
I actually worked on this exact problem as an intern job at our university. We used a huge corpus of communication (for example, we had access to all the emails every sent internally at Enron).
We used this as the basis to train a speech-to-text engine by automatically correcting likely-wrong interpretations. "I go loo school" would be corrected to "I go to school", for example. It worked remarkably well.
The basis of all these subtitles can be used, but there are far bigger (and better?) collections of data to be used to train these machine learning engines.
I can confirm that this is the corpus. I can also confirm that, even though the emails are all from mid-to-senior management, the writing style is very sloppy.
Very neat. I queried for a line I remembered as "a story Englishmen tell when they're down in the mouth", and it corrected this "Englishmen tell it [etc.]", identifying the movie as Beat the Devil.
I typed in "finality" as the search term. There's a scene in which this word is used where Nick Nolte gives a speech to the Hulk. It only came up with results that had "finaLLy" in the results(?)
http://www.quodb.com/#search/i'll%20be%20back