Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Can they not index bigrams or trigrams, then chain together index hits? E.g. "It never rains in december" would hit on "It never rains", "never rains in", and "rains in december". Any result that hits on all of the indexes is not guaranteed to hit on the entire phrase but it would be a good candidate for the top result. The longer the phrase, the more likely a candidate hitting all necessary index phrases would match the exact phrase. This would at least put a limit on how large the indexes need to get.

Further, if they retain copies of the full text in their database they could do a filtered scan of the documents that hit on all subphrases to guarantee exact match. I could see that having too much of a performance impact at scale though.

In any case the dumbing down of Google search over the last few years is immensely frustrating to me.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: