Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Putting "exact phrases" in quotes no longer seems to work. Then they added insult to injury by forcing you to +mandate a term otherwise they might just ignore it. Now that no longer works either.

I don't understand why they got rid of these escape hatches. Sometimes I want the "top" pages containing precisely the text I enter -- no stemming, synonyms, etc. Maybe it shouldn't be the default, but why make it impossible?

In my ideal search world, there would also be an option to eliminate any page with a display ad or affiliate link. Sometimes I only want the pages that aren't trying to make money off of me.



I have a solution: search engine which uses machine learning to score the "commercialness" of a page. By commercialness, I mean: is it a table of products with prices; does it have buy buttons; does it use a lot of tracking and analytics; does it have a cart; is there a lot of product talk (and is it overbiased positively); how are all the pages within a couple link-degrees scoring; ... (and more). Then, give users a slider which right side means no filtering, left side means basically only return universities, Wikipedia, and PBS tier results.

This has to track number of ads and trackers in a page and not just be about product pages. This measure should also fight SEO spam, as the tracking and advertising elements would cause SEO spammers to lose rank on the engine (disincentivising an arms race).

Add in the patently obvious need for the poweruser's 2nd search bar, which takes set notation statements and at least one of a few popular powerful regex languages, and finally add cookie stored, user-suppliable domain blacklists and whitelists (which can be downloaded as a .txt and reuploaded later on a new browser profile if needed). I never ever want to see Experts Exchange for any reason in my results, as an immediately grasped example. Give the users more control, quit automagicking everything behind a conversationally universal idiot-bar!


I use uBlacklist to get rid of Expert Sexchange and similar low-quality sites in search results, and it seems to work well enough.

An "advanced mode" supporting literal keywords (with and without stemming) and boolean operators wouldn't cost the search companies anything. I think supporting regexp search would be hard: do you search your index for fixed substrings and expand around them? I'm not a search person...

I don't think you'd need much in the way of machine learning to filter out the spam. There are relatively few third-party display ad servers and affiliate networks, and those are the main lazy ways to make money. There's no need to filter out all commercial content; just getting rid of the "passive income" bros would be enough.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: