Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

On which note, the answer to a list like this isn't necessarily "memorize it and avoid all these problems". The benefit can simply be in making these tradeoffs consciously, so you can judge your model better.

If you're Google, differentiating 'or' as in either from 'OR' as in Oregon is a task you need to take on. But if you're writing a National Park lookup tool, you probably just don't want to worry about that case. In that case it's still worth knowing; you might be able to save users some time by at least showing clearly how you reinterpreted their input.



>The benefit can simply be in making these tradeoffs consciously, so you can judge your model better.

Very much so; engineering is all about choosing the trade-offs, and hopefully improving them in the future. The list also helps with solving some of the unknown-unknowns problem in regard to what the customer expectations may be; even whole new domains of expectations (like immediacy of update, or handling of accented/non-english characters).

Side note:

As far as I can tell, Google got rid of the special-cased "OR" in the general search - right now it's a word, not a predefined/reserved symbol.

They were able to do so by adding "implicit OR-like" operator between all the words in the query. Not quite an implicit OR, not quite an implicit AND; something bit more complex in between.

The words of the query get weighted against matches both on their own, but also as adjacent words (higher weight) and whole phrases (yet higher weight). All in all the problem got solved by improved matching & sorting algorithm, not by somehow smartly detecting when "OR" is meant as "OR", or OR, or or.

The problem got solved in the match scoring/sorting domain, rather than in the query parsing domain.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: