Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask YC: Regex web search?
5 points by aneesh on Aug 11, 2008 | hide | past | favorite | 4 comments
Do any of the major search engines offer a way to query the search engine using a regular expression? I'm not even looking for complete regex support, even using wildcards to represent missing letters would be great, so that I can search for "Olymp*" and it will include results for "Olympics". Google doesn't seem to support this. Are the pages indexed in a certain way (ie, by word) that would make this kind of search prohibitively difficult?


So in your case doing Olymp* is actually not too difficult (thats just stemming) but doing actual regex matching accross the internets would be hard sauce. One of the traditional ways of storeing an index like this is word -> {documents}. Doing a search for a set of words is then not too expensive, however for full regex support they would have to look at every word entry. Thats just sadpanda to the max.



This is probably overkill, but Amazon Web Services offers something called "Grep the Web" which you can use to run offline regex searches of the web: http://www.amazon.com/Alexa-Web-Search/b?ie=UTF8&node=26...


http://www.google.com/codesearch does full regex search, but I'm not aware of any mainstream web search engine that does.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: