Hacker News new | past | comments | ask | show | jobs | submit login
Peter Norvig on Innovation in Search and AI (notjustrandom.com)
53 points by sinamdar on July 15, 2010 | hide | past | favorite | 5 comments



He did a similar talk at Startup School 2007 (I think) where he showed how relatively dumb algorithm can build a database of simple facts. It worked very simple: crawl a TON of web-based text and look for patterns like "XXX such as AAA,BBB and CCC" -> this way you'll learn that cats, dogs and monkeys are animals.


The most important detail noted is training over more data gets higher results than using a better algorithm. One of the recent pushes in the field of machine learning and natural language processing has been trying to bootstrap larger training corpora from smaller initial sets.

As an example, one of my friends did her thesis on trying to use simpler sentences (that you're either confident you have parsed correctly or have gold standard (i.e. correct) training data for) to parse more complex but related sentences (see [1]) This is useful as even if you don't have a huge amount of gold standard training data statistically the parser is far more likely to get the derivation correct for shorter sentences than for longer. Using those shorter sentences you can help in parsing longer sentences.

That's why Google is so powerful. I spent a summer internship there and they have two really powerful things - data and the tools and techniques to handle it. In one afternoon a single employee could run through more data than entire companies would use for months.

[1] "Mozart was born in Salzburg in 1756" vs "Wolfgang Amadeus Mozart was born on the 27th of January 1756 at 9 Getreidegasse in Salzburg" (the latter is a slightly modified example from Wikipedia)


Be careful listening to this one with headphones w/ high volume. The sound is pretty bad.


The sound is okay, it's just clipping in the introduction. Skip to 00:02:30


audio is clipping for me




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: