Hacker News new | past | comments | ask | show | jobs | submit login
Google Removes More Than 11 Million .co.cc Domains From Search Results (digitizor.com)
133 points by dkd903 on July 6, 2011 | hide | past | favorite | 83 comments



The Google search results are still extremely spammy. For anything with commercial potential and especially anything which could lead to an immediate purchase, the noise is just overwhelming. I recently looked into getting a VPN provider; it is simply impossible to research them without being showered in pages that are either outright spam or merely search-engine-optimized out the wazoo (pardon my French). They've learned to imitate user participation, reviews, blogs, anything that you care to name that you used to look for to ferret out the really good pages.

I'll be more specific: about a century ago, in internet years that is, it was a good idea to add "review" to the search terms to see what actual users were saying about a product. This has become an extremely bad idea now, since SEO specialists (I am using neutral language here) noticed this, and have registered 1.5 quintillion domains along the line of best-barbecue-reviews.com and reliable-handbag-reviews.com and what have you; and these domains are the last places to contain actual reliable reviews of anything, being paid advertising in disguise, instead. And Google loves to find search terms in domain names.

And that's just a tiny, tiny part of it.

P.S. Please don't turn this into a thread of recommendations for VPN providers - it will probably get flagged as off-topic.


How could this be solved though?

There are few easy-to-attain signals that will help you to differentiate search results in a way that approximates the opinion of an intelligent human.

I believe we have reached the point in which Google's search results are only useful for one reason: speed of information retrieval. If you wish to get high quality information you visit sources that you trust. These are either (a) content providers that have provided high quality information to you previously, or in the cases in which you've not had to search for some information before (b) socially curated information from user communities in which you trust and are a part of.

Here's an example: http://www.google.co.uk/search?q=mp3+player+reviews

Most of the content rated highly by Google has either nothing to do with reviews or is hideously out-of-date, note the 2007 'reviews' in the second result down. It's junk all the way down. For a lot of people the Google search bar has been denigrated to a fancy URL bar; it takes them to places only when they know where they're going: you avoid browsing for fun.

So what can be done... An extension that allows you to join an invite-only group of Hacker News users who will remove any rubbish search results from Google? Using the +1 as a signal to try and make the search results better? I can only see the former idea as being difficult to game. The latter would be unused apart from for very popular searches to the extent that a determined spammer with a botnet wishing to cheat would probably be able to affect many search results.

Better minds might know better ways.


Hey, I come here to offer useless kvetching and complaints, not solutions :). Seriously, if I knew how to solve it, I'd be the CTO of Google.

But I'll make a couple of guesses. I think they need to go for a lot more human input. It goes against the grain of everything they do, they want to solve everything algorithmically. But SEO is just too good at abusing algorithms. I think they need to get more humans involved, and force the SEO guys to tone it down.

And one tiny specific suggestion: drop the strong signal of a search term being in the domain name. Maybe leave it for proper names, but drop it for all generic words, please. It used to be a good signal. It used to be that when you saw your search echoed in the domain name, you knew you found what you wanted. Now, I cuss and skip it, because I know it's spam. It's been abused beyond belief.

EDIT: Yes, lots of people suggest "going social" on the problem. That's a lot harder for most people to manage than just using a search box. And searching for "trusted social group for reviews" is just going to take you to mytrustedsocialgroupforreviews.com. Guess what's there.


We tried doing something similar at XMarks (a bookmark sync service) when I worked there. We figured bookmarks were a good source for mining quality websites and hired some very bright people with a search background to build it. For some searches, it gives pretty good results. Here's the one it gives for VPN providers:

http://www.xmarks.com/topic/vpn_providers

It was an uphill battle for us, which ultimately never worked out. Early on, we did a usability study with users to see how they use search engines and the people we interviewed tended to think that google was more of an expert than themselves and if they were having trouble finding things, it was more their problem than google's.

We ended up trying to take our data and tried decorating it directly on the google search results page to start getting usage, but it was really hard to get any traction and build any momentum. For most searches, Google is good enough and only a small number of our current users started using our website for search/discovery.

We ended up spending a significant amount of time decorating the ads within search results and after a considerable amount of logging and number crunching, found that decorating those ads not only increased the clickthrough rate of ads we decorated, but also increased the overall ad clickthrough rate. That's a huge win for a search engine and worth a lot of money, but ultimately google just did it themselves, and for other search vendors that had ads on their pages, they were basically handcuffed to google's contract which wouldn't allow them to use our technology.

TL;DR - Competing with Google is hard.


The conflicting interests involved in building a search engine are a fascinating subject. This is probably a bit deep in the comment tree to start a discussion of it, but it's certainly true that it doesn't do you any good to present wonderful academically vetted search results, if the users don't click on the dang ads.

For the particular search here - yes, you do better than Google, but you don't have a million hungry SEO guys trying to game you. So it's not a fair comparison. If you get as big as Google, it will be a different story. Of course, if you get as big as Google, you might not care :)


I think one reason why the search product didn't gain traction was because it interfered horribly with the original goal of Foxmarks -- a service to sync bookmarks.

Speaking as a former user of Fox/XMarks, I saw the product go from a simple bookmark sync, to a bloated extension that tried to do more than what I had downloaded it for.

I do think that the data could have been useful, but it would have been a better idea to create a completely different product than to bundle additional features onto an existing one.


Why not sell this data to Google and other search engines, instead of competing directly with them?

Note that I've used XMarks / delicious in the past to get more relevant results. When a link has been bookmarked by many users, it is more likely to be interesting. It was extremely useful when approaching an unknown subject.


> Drop the strong signal of a search term being in the domain name [...] It's been abused beyond belief.

I whole-heartedly agree—but people still search for "facebook.com login" and similarly trivial queries. If Google broke returning "facebook.com/login" for these it'd lose a lot of its value for these people.


I can only imagine how many links facebook login has across the web from their widgets. Don't think we have to worry about them falling out of the top spot for that query, in fact if they did google would likely get a visit from the DOJ soon after...


Here's a link to a RWW story about what msbarnett is talking about: http://www.readwriteweb.com/archives/web_illiteracy_how_much...

It surprised me at first, but I really forget what it's like to be tech illiterate.


The Facebook login page did fall out of the top spot last year in favor of a ReadWriteWeb story about changes to the Facebook login page.

The comment thread rapidly filled with deeply confused people complaining that they couldn't figure out how to log in on this new page.


Search term in domain name should not have as much value as it currently does. Along similar lines, I think link anchor text is way overweighted as it's fairly easy to game for commercial searches and not at all common for the type of "editorial link" model pagerank assumed.

As it stands, link buying is rampant: as risky as black hat techniques are for real businesses, if you're building diverse spam web properties through a scalable model, it's not really risky at all. it's not like google can give you a permaban.


> That's a lot harder for most people to manage than just using a search box.

Hey, sounds like a business opportunity. "Join our bogon-buster network! [fine print: invite only, your activities and reviews will be rated by other members.]"

Of course, after awhile, you'll need another layer on top of it to make sure that your bogon-buster networks aren't full of spammers themselves.


Yes, you then have the same problem suffered by most social networks.

I actually think a similar problem has been partially solved by Twitter already. Twitter grows but the content curation is done by the people you follow therefore new users don't affect your interactions on the site unless you let them. Of course, there's gardening work which has to be done by the user in getting to this point but it would probably be less work than browsing some of Google's search results.

Additionally the intelligent thing would be to mine the social interactions of authoritative users to provide public data for users that don't want to engage with the app.

edit: Of course, it's possible to become an authoritative spammer in which case as a power user you will still need to garden the group of users that you interact with.


So why not adapt the Twitter model to this? You can choose who to include in your Bogon-net, and maybe set different confidence requirements; e.g., "Show me only results that at least 50% of my Bogon-busters have rated, and only show me results that have a 75% approval rating."

The problems I already see are a) having to compute all those scores, and b) I'm unlikely to rate more than a handful of sites a day - I think Google tried that with their browser toolbar, too.


> How could this be solved though?

We aren't talking about tornadoes or earthquakes. Let's remember that spammers are people.

How would we "solve" people who spread toxic waste on city streets?

That is exactly the way to deal with spammers. They have names, addresses, bank accounts, finally bodies. All of these could be located and dealt with if law enforcement officials felt like it.

Until spammers are dealt with in exactly the same way as other types of miscreants who love to piss in our collective soup, the problem will persist and grow arbitrarily worse.


Please don't get the government involved in this. As much as I dislike eHow.com, I don't think they need to be dragged away in chains for "SEOing one's way to the top of search results, title 256, section D". That's not a good kind of call for the government to make.


> I don't think they need to be dragged away in chains

Why not?

I would love to see spammers imprisoned or perhaps even publicly flogged. With hangings for repeat offenders.

The social cost of spammers (in terms of the time and money wasted on filtering attempts) may well be comparable to that of burglars.


I doubt you could ban the type of spam we're talking about. Sure, it's already illegal to send spam emails. But that's because email can reasonably be defined.

Not so for spam sites. What makes a site "spammy?" As reflected in this thread, a site is spammy if it a) ranks better than it should in search, b) fails to provide useful content, and c) is commercial.

Which of these can be turned into an objective test? Just (c), I would venture. And that's clearly unworkable. (a) and (b) are far too subjective and vague to be codified.

Also, I think any effective ban would run far afoul of the 1st Amendment. Even if you could carve out a narrow 1st Amendment exception for certain kinds well-defined spammy conduct, spammers would inevitably just skirt around the edges of that. They'd adapt just like they always do.


> spammers would inevitably just skirt around the edges

Imagine if spam were treated in the same way as child porn, the apparent "root password to the US Constitution."

Or the way drug smuggling is treated in East Asia.


I love this idea, but who decides what "spam" is? There's a lot of spam I get that I don't like, but isn't much different than other marketing efforts. It's not all V1agr4 emails and 419 scams.

IMHO this is akin to the "pornography" issue. What is "pornography"? "I know it when I see it".


Google's search results are extremely... crap. There I said it.

It used to be that I could search for something, say this:

word1 word2

and Google would give me the results for pages which contain both those words.

Then Google decided I didn't mean what I searched for, so I had to prefix my searches with plus signs, like this:

+word1 +word2

And now this week I find out that even that doesn't work any more.

It's crap. Maybe duckduckgo does it right. I'll give it a go.


I agree completely, but because no other engine provides decent results while indexing so much, there's really no viable alternative right now. DDG is good for very mainstream things, but does not do a good job of finding user profile page X on site Z, etc.

All the time I'm having to quote my terms because Google tries to correct them. Same thing with Google disregarding search terms, I always have to prepend those pluses like you do there. Auto-correction should be enabled, not default behavior.

The worst is that a search for "some-thing download" or "some-thing pdf" is positively cluttered with phony search results from torrent aggregators, etc. Why is Google indexing these?

The switch to a JS-heavy page from a simple HTML one means that I inadvertently clear my results with a new, unintended query, or navigate several searches back when I intend to just do one. Sometimes search doesn't even work. "Instant" is a failure. They removed the "Instant off" option from the search page, and I'm sure there's something when you're logged in, but I'm not often logged in to Google services.


Duckduckgo's innovations are pretty cool, especially bang-commands (to search on google maps, add "!m" to your query).

But the underlying search is Bing.

Perhaps Google is personalized and Bing doesn't, or perhaps DDG doesn't pass a uid token to Bing, but it's just not as good actual results, which matter to me about 10% of the time. (Luckily, the query parameters to google.com/search are the same as to duckduckgo.com/ so a simple massage of the URL works well.)


I came to the same conclusion after switching from google to duck duck go for 2 months. I found what I was looking for faster with google.


"a simple massage of the URL works well"

As does prefixing your search with "!g".


Well, I'll be jiggered.


maybe I'm just used google's crap, but after using DDG for a week, I found that it gave results that i didnt want more often.

Then again, maybe it is just the interface, Google has a cleaner interface IMO making it easier to see results.

And its also likely that i didnt get used to some of the special additions in DDG like tags and what ever else.


Google results have always been crap, they were just 'less crap' than other search engines until the past couple of years, where Google has been overwhelmed by spammers and, frankly, is losing that battle.


So there is Blekko (http://blekko.com) which does use what we (yes I work there) consider better social signals for ranking search. Try this and let me know what you think:

Go to blekko.com, enter your two words and append /monte to the search. That will put three columns up, results from Blekko, Google, and Bing. Pick the column with the best search results and see which service delivered them.

It is a great way to see what is what.


Interesting.

Blekko was a distant third for my go-to search engine confuser "ruby god" in the three-engine monte, and I was going to write it off for my purposes when it occured to me to try "god /ruby", which to your credit is much closer to being competive with google, and drastically outperforms bing.


Never had any issue. Give some examples of what doesn't seem to work for you.


Your results are likely to be different because of Google's practice of "personalizing" results based upon the consumer's profile. Any discussion of the quality of Google's results should be preceded with "For me."


That's a cop out. Provide examples, on a clean browser/incognito/no cookies/etc


It looks like indeed the public sources aren't cutting it anymore. Maybe there is a niche to be filled: I mean something that would use recommendations (maybe not limited to explicit ones) from one's actual friends to get valuable info on products and services.

After all that's what most people successfully used and there might be some potential in using the internet to boost this process. I'm not saying which Google product fits here, because HNers already grumble about it getting too much coverage :)


Look, you and I and just about all the readers here can figure out where to find this type of information. But for a lot of internet users - maybe most of them - the internet is viewed through the slit of Google Search, so to speak. And this used to be fine, since Google would helpfully point you on your way to where you want to go.

We used to laugh at people who came to some forum or newsgroup and asked "how much should this cost" or whatever. Can't you google that, buddy? Well, you really can't anymore.

Oh, and for the category of products at which you hinted - spam-flavored pseudo-viral marketing is even more annoying than search engine optimization. But yes, I do get the idea of going directly to trusted sources and skipping the search engine; the problem is that search engines were invented because they are so convenient, and it would be a shame to lose that.


> I recently looked into getting a VPN provider; it is simply impossible to research them without being showered in pages that are either outright spam or merely search-engine-optimized out the wazoo (pardon my French)

I recently did the same thing and didn't have a hard time finding a provider. Both StrongVPN and WiTopia showed up at the top of the SERPs (and StrongVPN also in AdWords), I went with WiTopia after then searching for reviews of both.


And yet in a category where the natural results have become useless, the AdWords above and to the right can still be valuable. Ability-to-pay is a useful filter, and Google often looks more closely at the content and people behind those paid-advertiser landing pages.

As long as people aren't leaving en masse for another search engine, there is a margin at which Google makes more money when the AdWords remain useful, but the natural results are not.

Of course, Google still has to keep the natural results about as good as the external competition. But those other search engines have to fight the same SEO trickery that Google does. Competitors don't have the same ad revenues to finance the fight. Competitors don't have the same ad inventory to serve as a backup reservoir of market-filtered results. And competitors don't have the benefit of many-years of user-habit to try Google first. So, Google's scale advantage keeps growing, even if natural-search quality is middling.


There is some innovation in this space. For example did you try Blekko maybe with some relavant hashtags. For comparison:

http://blekko.com/ws/buy+vpn+/monte

Or duckduckgo? It will be interesting whether G+ social signals help Google improve this going forward


The issue is, it's no longer web 1.0. You don't have a few thousand geeks with domains, you have about a billion users with blogs, tweet streams, and facebook pages. And so Pagerank, which implicitly works off domains is busted.


Pagerank has nothing to do with domains. It applies a score to individual pages.


It does involve domains as well. That's why demandmedia and the like used to have higher rankings. Any old ehow article isn't linked to by many, but taken as an aggregate they are. I've always felt that they should be ranking individual pages and completely disregard the domain. We'd see a lot more expert content that way.


What you're describing is not PageRank, but the extra information that Google's current and secret algorithm uses. The PageRank algorithm doesn't involve domains, and no amount of downvoting is going to change that -- check your facts.


I'm not sure that there is a HN downvote but I upvoted you anyway.


The best approach these days is to compare the results for "<PRODUCT> sucks" with "I love <PRODUCT>" and see who wins


Great, and now that you let the secret out, the wires to the domain registrars are on fire, bringing us names like "i-love-my-acme-grill.com". :)


Dang ... there goes traffic to the First State Tourism Board's http://www.visit-delaware.co.cc


Wouldn't it make sense for them to be using .us anyway?


The page says Delaware's new slogan is "Delaware: boring on purpose" which seems to suggest it is a parody site.


Seems like a cop out to me. It kind of says, our technology can't weed these results out so we are just going to not deal with it. I am sure it is difficult problem; don't get me wrong. However, I'd rather see Google say that some fancy algorithm determined co.cc domains are junk and has relegated them appropriately.


I'd rather they fixed the results now and worried about the algorithm later (and you know they're working on the algorithm) than make us suffer unnecessarily until the algorithm was suitably improved.


You'd rather they lie to you?


One thing that irks me is Google News. It's hard to get your site listed on Google News but when you see some of the sites that are listed, you have to wonder why Google makes it so difficult. Some search terms bring up "news sites" that are essentially devoted to spamming health remedies and playing them off as though they were real news articles.


I don't know, Google News actually doesn't seem very spammy to me. I just searched for "diabetes" - this is usually a good one to check if an index is spammed - and while it brings up some sensationalist garbage (that's life for you), it doesn't seem to find anything which wants my credit card number right now. Even a spam-king search like "weight loss", while it finds nothing interesting, doesn't seem like pure spam.


Do a search for RezVera. I research digestive health news & this has been coming up a lot over the last few weeks. Either copycat sites or PR sites that link back to another site with an iframe containing a short form promo offering the supplement for $60 bucks. Blah.


I checked it out. RezVera is obviously a commercial name, not a generic name, and its entire internet presence consists of a few PR sites. Google News finds them all, and puts them on one page. It's not like they are crowding out better RezVera news. There is maybe something to be done here about delisting these sites altogether, but I don't see a huge problem for this particular search. Or is there some legitimate review of it, which gets crowded out?


I am not searching for RezVera. I am searching "gastritis" "irritable bowel syndrome", etc... RezVera is just a quick way to see the sites that get listed on Google, I assure you those sites come up for legit searches too.

If you do a search for "Irritable Bowel Syndrome" & then sort by date 3 of the top 4 choices are: "Manuka Honey Capsules Announced", "New Supplement [RezVera] Shows Great Promise for Gastritis", "CSIRO cereal a real superfood". Spammy McSpam Spam.

Sometimes press releases blur the lines between advertising & news, I get that, but these sites aren't really that useful & look almost like they're gaming Google News to advertise. Rarely do they actually have any authoritative links backing up their claims, which is something Google supposedly wants from sites joining Google News.


This is the problem with the customization of Google - for me, these are not in the top 4 results even sorted by date, although 2 of your titles show up further down. But yes, I see your point now, someone spammed Google News as well. Scum. I think it's still less spammy than the Web search, since some kind of human attention is required to get in there.


Take a look at game search terms, like Warcraft. There are plenty of sites that sell in game currencies or items, along with sites that republish news from other sites mixed in the results.


Have you thought that's because many people do buy currency and items?


These sites are masquerading as news sites with poor English rehashing of old news, or just pasting news directly from other sites. They aren't quality results that belong in Google News.


I'm looking into listing a site on google news. Can you elaborate on the problems you've encountered when trying to list?


Google News, like a lot of their services is a black box that no one really knows how it works or what criteria they use to pick sites for inclusion.

Generic Google News talking points are:

You need at least 3 or more editors/staff, with some sort of bio page detailing who they are.

You need to push a lot of content through your site. What "a lot" is, no one really knows.

Content needs to meet a certain quality level. What the criteria are for "quality content" is anyone's guess. If you're big enough you won't be held to this as strictly.

Content should offer an independent viewpoint and not be slanted to one side or another. If you're big enough you aren't held as strictly to this.


Domains ending in .co.cc are subdomains, just like domains ending in .example.com or .uk.com. When people buy a subdomain then their purchase is not regulated in the same way as a domain (eg ending in .cc or .com) is.


5000 phishing sites cause Google to block 11,000,000 domains? That's less than 0.5%.

First, is that really "a significant fraction?"

Second, when did collective punishment become less evil than innocent until proven guilty?


You must somehow have avoided running into it, but .co.cc was a fever swamp and a nightmare. For any popular search terms you can think of, it contained "popular-search-terms.co.cc" and "my-popular-search-terms.co.cc" and "best-popular-search-terms.co.cc" and so on ad freaking nauseam. And it got spammed all over your screen, because Google likes to find search terms in the domain name.

What do you think was there? Do you think 11 million people woke up one day and decided ".co.cc would be a great domain for my business"?

They should have nuked it from orbit a long time ago.

EDIT: I just realized .co.cc is not actually down, but it's blocked by my provider now and no longer indexed by Google, so it's pretty much gone for me. Good riddance to bad rubbish.


It's an ugly cheap kludge. Furthermore it is a kludge which raises a question about how good Google's new singing dancing search results filtering algorithm really is, if it can't separate the wheat from the chaff at /co.cc/.

<Google directed cynicism>The cynic in me sees the timing of this as the first salvo in an effort toward the monetization of the coming proliferation of new commoditized top-level domains. When search results are nothing more than an advertising platform, why should a search engine return your results unless you are paying for the screen space? Perhaps, /co.cc/ was not encouraging the use of Google Analytics or promoting Adwords sufficiently.</Google directed cynicism>


You're confusing cynicism with paranoia. A cynic would think that Google has lost faith in its algorithm and is taking to manually blocking spammy sites; a paranoid would think that Google plans to have websites pay to be included in search, and is retaliating against hosts that don't use Google Analytics or Adwords.


Not really having a dog in the hunt for Google page ranks, it's not really paranoia for me. It's not really cynicism either given that it is a natural extension of Google's basic revenue model, i.e. showing links to pages in exchange for payment. I just added the tag to keep the post consistent with implicit HN style guidelines.

There is nothing sacred about the blue text links, and with "personalization" they don't reflect an objective page ranking, but a subjective ranking based on speculation about what one is most likely to click.

It is only in the minds of consumers of Google's search services that a wall between search results and advertising exists. With personalization, the same algorithms are applied to both.


> can't separate the wheat from the chaff at /co.cc/.

The wheat is a lie.


They blocked 1 domain and the millions of subdomains it was reselling.


I love how Google have the balls to do things like this, this can only be considered a good thing.


And just as they blocked the .co.cc, the attackers are now using the .co.tv:

http://blog.sucuri.net/2011/07/google-blocks-co-cc-attackers...

Never ending battle...


This is a really good thing, even though the people using these domains will likely just move to another free provider; we've seen large chunks of spam on WordPress.com from these domains more than once (linkfarming).

Kudos to Google.


Google blocked one domain that was reselling subdomains.


i never really saw many co.cc domains in the search results anyway. From what i've heard, many of the co.cc domains were used for distributing malware.

If you provide a free hosting service, you have to be prepared to constantly deal with spammers and phishers.


At Zite we have a domain-level spam filter and

   .co.cc
is the the 6th strongest feature for spam indication. The first five are:

   cheap
   forex
   pills
   viagra
   urnitur


This is one of the dumbest things Google has done.

If Google really wants to get rid of 90 percent of spammy sites all they need to do is block any website that is registered with Godaddy.


That doesn't make much sense. GoDaddy is the largest registrar in the world, with the most extensive general public advertising campaign, and manages over 45 million domains. They're the registrar for your neighbor and your neighborhood small business more often than not.


you could probably take it even further just by making it a lot more difficult to rank the other types of domains:

1. anything that's not a .com or .org(let's face it...if it's a .net, then it's most likely bought for seo optimization)

2. the .co.uk/.de etc, should only get a bonus in their own countries.

3. anything that's longer than 16 letters. If it is, chances are that its just a spam domain.

4. anything with a dash in the domain(the more dashes, the bigger the penalty).

5. anything with a # in the domain

this way brands would be fine, but those crappy domains that were purchased solely for SEO wouldn't be as effective. Sure, .com exact match would be fine, but that's a lot less to worry about.


Most ISPs I've used use .net for their domains, and there are plenty other legitimate sites that use .net (Battle.net, anyone?).


brands would still be able to rank.

it's just going to get a lot more harder to rank legitimateISPs.net


>2. the .co.uk/.de etc, should only get a bonus in their own countries.

A flawed rule. I deal with websites based in other countries and written in other languages. I also access websites from my home country when overseas.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: