There are few easy-to-attain signals that will help you to differentiate search results in a way that approximates the opinion of an intelligent human.
I believe we have reached the point in which Google's search results are only useful for one reason: speed of information retrieval. If you wish to get high quality information you visit sources that you trust. These are either (a) content providers that have provided high quality information to you previously, or in the cases in which you've not had to search for some information before (b) socially curated information from user communities in which you trust and are a part of.
Most of the content rated highly by Google has either nothing to do with reviews or is hideously out-of-date, note the 2007 'reviews' in the second result down. It's junk all the way down. For a lot of people the Google search bar has been denigrated to a fancy URL bar; it takes them to places only when they know where they're going: you avoid browsing for fun.
So what can be done... An extension that allows you to join an invite-only group of Hacker News users who will remove any rubbish search results from Google? Using the +1 as a signal to try and make the search results better? I can only see the former idea as being difficult to game. The latter would be unused apart from for very popular searches to the extent that a determined spammer with a botnet wishing to cheat would probably be able to affect many search results.
Hey, I come here to offer useless kvetching and complaints, not solutions :). Seriously, if I knew how to solve it, I'd be the CTO of Google.
But I'll make a couple of guesses. I think they need to go for a lot more human input. It goes against the grain of everything they do, they want to solve everything algorithmically. But SEO is just too good at abusing algorithms. I think they need to get more humans involved, and force the SEO guys to tone it down.
And one tiny specific suggestion: drop the strong signal of a search term being in the domain name. Maybe leave it for proper names, but drop it for all generic words, please. It used to be a good signal. It used to be that when you saw your search echoed in the domain name, you knew you found what you wanted. Now, I cuss and skip it, because I know it's spam. It's been abused beyond belief.
EDIT: Yes, lots of people suggest "going social" on the problem. That's a lot harder for most people to manage than just using a search box. And searching for "trusted social group for reviews" is just going to take you to mytrustedsocialgroupforreviews.com. Guess what's there.
We tried doing something similar at XMarks (a bookmark sync service) when I worked there. We figured bookmarks were a good source for mining quality websites and hired some very bright people with a search background to build it. For some searches, it gives pretty good results. Here's the one it gives for VPN providers:
It was an uphill battle for us, which ultimately never worked out. Early on, we did a usability study with users to see how they use search engines and the people we interviewed tended to think that google was more of an expert than themselves and if they were having trouble finding things, it was more their problem than google's.
We ended up trying to take our data and tried decorating it directly on the google search results page to start getting usage, but it was really hard to get any traction and build any momentum. For most searches, Google is good enough and only a small number of our current users started using our website for search/discovery.
We ended up spending a significant amount of time decorating the ads within search results and after a considerable amount of logging and number crunching, found that decorating those ads not only increased the clickthrough rate of ads we decorated, but also increased the overall ad clickthrough rate. That's a huge win for a search engine and worth a lot of money, but ultimately google just did it themselves, and for other search vendors that had ads on their pages, they were basically handcuffed to google's contract which wouldn't allow them to use our technology.
The conflicting interests involved in building a search engine are a fascinating subject. This is probably a bit deep in the comment tree to start a discussion of it, but it's certainly true that it doesn't do you any good to present wonderful academically vetted search results, if the users don't click on the dang ads.
For the particular search here - yes, you do better than Google, but you don't have a million hungry SEO guys trying to game you. So it's not a fair comparison. If you get as big as Google, it will be a different story. Of course, if you get as big as Google, you might not care :)
I think one reason why the search product didn't gain traction was because it interfered horribly with the original goal of Foxmarks -- a service to sync bookmarks.
Speaking as a former user of Fox/XMarks, I saw the product go from a simple bookmark sync, to a bloated extension that tried to do more than what I had downloaded it for.
I do think that the data could have been useful, but it would have been a better idea to create a completely different product than to bundle additional features onto an existing one.
Why not sell this data to Google and other search engines, instead of competing directly with them?
Note that I've used XMarks / delicious in the past to get more relevant results. When a link has been bookmarked by many users, it is more likely to be interesting. It was extremely useful when approaching an unknown subject.
> Drop the strong signal of a search term being in the domain name [...] It's been abused beyond belief.
I whole-heartedly agree—but people still search for "facebook.com login" and similarly trivial queries. If Google broke returning "facebook.com/login" for these it'd lose a lot of its value for these people.
I can only imagine how many links facebook login has across the web from their widgets. Don't think we have to worry about them falling out of the top spot for that query, in fact if they did google would likely get a visit from the DOJ soon after...
Search term in domain name should not have as much value as it currently does. Along similar lines, I think link anchor text is way overweighted as it's fairly easy to game for commercial searches and not at all common for the type of "editorial link" model pagerank assumed.
As it stands, link buying is rampant: as risky as black hat techniques are for real businesses, if you're building diverse spam web properties through a scalable model, it's not really risky at all. it's not like google can give you a permaban.
> That's a lot harder for most people to manage than just using a search box.
Hey, sounds like a business opportunity. "Join our bogon-buster network! [fine print: invite only, your activities and reviews will be rated by other members.]"
Of course, after awhile, you'll need another layer on top of it to make sure that your bogon-buster networks aren't full of spammers themselves.
Yes, you then have the same problem suffered by most social networks.
I actually think a similar problem has been partially solved by Twitter already. Twitter grows but the content curation is done by the people you follow therefore new users don't affect your interactions on the site unless you let them. Of course, there's gardening work which has to be done by the user in getting to this point but it would probably be less work than browsing some of Google's search results.
Additionally the intelligent thing would be to mine the social interactions of authoritative users to provide public data for users that don't want to engage with the app.
edit: Of course, it's possible to become an authoritative spammer in which case as a power user you will still need to garden the group of users that you interact with.
So why not adapt the Twitter model to this? You can choose who to include in your Bogon-net, and maybe set different confidence requirements; e.g., "Show me only results that at least 50% of my Bogon-busters have rated, and only show me results that have a 75% approval rating."
The problems I already see are a) having to compute all those scores, and b) I'm unlikely to rate more than a handful of sites a day - I think Google tried that with their browser toolbar, too.
We aren't talking about tornadoes or earthquakes. Let's remember that spammers are people.
How would we "solve" people who spread toxic waste on city streets?
That is exactly the way to deal with spammers. They have names, addresses, bank accounts, finally bodies. All of these could be located and dealt with if law enforcement officials felt like it.
Until spammers are dealt with in exactly the same way as other types of miscreants who love to piss in our collective soup, the problem will persist and grow arbitrarily worse.
Please don't get the government involved in this. As much as I dislike eHow.com, I don't think they need to be dragged away in chains for "SEOing one's way to the top of search results, title 256, section D". That's not a good kind of call for the government to make.
I doubt you could ban the type of spam we're talking about. Sure, it's already illegal to send spam emails. But that's because email can reasonably be defined.
Not so for spam sites. What makes a site "spammy?" As reflected in this thread, a site is spammy if it a) ranks better than it should in search, b) fails to provide useful content, and c) is commercial.
Which of these can be turned into an objective test? Just (c), I would venture. And that's clearly unworkable. (a) and
(b) are far too subjective and vague to be codified.
Also, I think any effective ban would run far afoul of the 1st Amendment. Even if you could carve out a narrow 1st Amendment exception for certain kinds well-defined spammy conduct, spammers would inevitably just skirt around the edges of that. They'd adapt just like they always do.
I love this idea, but who decides what "spam" is? There's a lot of spam I get that I don't like, but isn't much different than other marketing efforts. It's not all V1agr4 emails and 419 scams.
IMHO this is akin to the "pornography" issue. What is "pornography"? "I know it when I see it".
There are few easy-to-attain signals that will help you to differentiate search results in a way that approximates the opinion of an intelligent human.
I believe we have reached the point in which Google's search results are only useful for one reason: speed of information retrieval. If you wish to get high quality information you visit sources that you trust. These are either (a) content providers that have provided high quality information to you previously, or in the cases in which you've not had to search for some information before (b) socially curated information from user communities in which you trust and are a part of.
Here's an example: http://www.google.co.uk/search?q=mp3+player+reviews
Most of the content rated highly by Google has either nothing to do with reviews or is hideously out-of-date, note the 2007 'reviews' in the second result down. It's junk all the way down. For a lot of people the Google search bar has been denigrated to a fancy URL bar; it takes them to places only when they know where they're going: you avoid browsing for fun.
So what can be done... An extension that allows you to join an invite-only group of Hacker News users who will remove any rubbish search results from Google? Using the +1 as a signal to try and make the search results better? I can only see the former idea as being difficult to game. The latter would be unused apart from for very popular searches to the extent that a determined spammer with a botnet wishing to cheat would probably be able to affect many search results.
Better minds might know better ways.