Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Is it feasible to roughly categorize websites (or parts of websites, e.g. subreddits) and then perform full text search on them only, or at least establish priorities? For instance, it's extremely unlikely OP would find anything useful about part numbers on *.reddit.com/r/funny.

Does anything like that exist today in usable form?



Sure -- you can restrict the search to websites that seem to be relevant to the query. Maybe you do it by hand at first, then you switch over to an ML model. You optimize the ML model for a few years, adding in more signals, and allowing it to guide other parts of the query to keep costs in check and raise quality (as measured by your preferred metrics)... and now you have reinvented fuzzy search that behaves in unintuitive ways, and people on HN complain that your service was better when it did raw string matching.


It's interesting you ask that, because the now-paid Kagi Search has a feature like this called Lenses: https://blog.kagi.com/kagi-features#:~:text=and%20Google.-,L...

I've found them somewhat helpful in getting rid of some of the garbage that pops up when I'm searching for a specific topic. But often, I find myself leaving them off since I sometimes want all the results on anything even slightly related because it might be useful.


This looks awesome. Unfortunately the pricing is like an order of magnitude higher than what my broke ass can pay for, but I'm truly rooting for them.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: