Hacker News new | past | comments | ask | show | jobs | submit login
Black Hat SEO Case Study: How Mahalo Makes Black Look White (seobook.com)
278 points by mocy on Jan 24, 2010 | hide | past | favorite | 68 comments



Here's a weird experience I had: I tweeted about a startup idea of a "woot.com site for travel" http://twitter.com/callmeed/status/1422601143

Later that day, it somehow got turned into a Mahalo question (I didn't submit it). I thought it was interesting that Jason himself commented on it, but it still seemed strange.

Now, when you google "Woot.com for travel" or "Woot for travel", that Mahalo page comes up on position 1 or 3.


Air New Zealand has a 24 hour limited deal website www.grabaseat.co.nz. It focuses on the New Zealand domestic market and has been very successful.


that's the beauty of authority sites, you can rank for stuff just by mentioning them once.


This is the flaw in how Google does things that competitors need to exploit. Authority should be more topic based then site-wide.


yeah I remember reading a blog a few days ago, it was a PR8, and the guy did an experiment. Just added a link for something viagra related. Just a single link.

And within a week he managed to get on a front page in Google results.


[deleted]


Page Rank 8


Actually, I am more interested on how that idea goes?


Still just an idea. I've considered applying to YC with it but I can't devote 100% of my time to it.

Plus my contacts in the travel industry are slim. I know some people at 1 boutique hotel chain but that's about it.


http://jetsetter.com by Gilt Groupe?


Jesus, that's about as s(c/p)ammy as you can get. Your business is rooted in theft and trickery and deception, Jason.


I honestly don't understand how this is anything new? Mahalo always has been a sketchy SEO scam that only a shameless self-promoter could pull off...


Jason, any comment?


Maybe he can also comment on masking ads as user generated content:

http://twitpic.com/z6m8b


my guess it's "oh shit!", followed by a prayer that Google doesn't do anything.

I don't think he has anything to worry about for that last part, Google is notorious for letting big sites off the hook(remember Target?) + here Google is also making their share of money off adsense


And Scribd.com, exact same model, scrape content regardless of copyrights, straps adsense on it, makes money.


Agreed. I think Merlin Mann has the best take yet on Scribd: http://tinyurl.com/yzl3mod


And pg capriciously blacklisted his domain from news.yc in response


Merlin is blacklisted from Hacker News? Link to discussion about this?


I just tested this by submitting that same Scribd rant. My account sees it as having been submitted, but if I'm not logged in the story doesn't appear at all.

Can I add that Paul's method of making things disappear without telling users is shit? It was a lot of fun realizing my account was dead and I wasn't just going crazy. Same with this story. I almost posted a comment defending him until I thought to test the story logged out.


Wait, you had to start a new account because he hell-banned unalone?

That's low, even by pg's standards.


You are free to question PG's motives and cry censorship, but to me, that blog post was malicious, tasteless, angry drivel that I really wouldn't want to see on here, whether it was about a YC company or not.


Even a casual search of ScribD content will reveal the motives for Mr. Mann's post. He has every right to be angry. The malice has been well-earned by ScribD's repeated actions with other creator's content for nearly its entire history.

While it's true there's no accounting for taste, I suggest you consult the dictionary on the definition of 'drivel'.

At the end of the day, PG can do, essentially, whatever he likes with the content, routing (or lack there of), and posting permissions on Hacker News. It's HIS site.

It's a shame that ScribD doesn't seem to play by an equivalent set of ownership principles. Namely, hosting and making money off other creator's content without so much as asking their permission; let alone proffering any ad-revenue sharing.


>I suggest you consult the dictionary on the definition of 'drivel'.

I meant drivel. The linked post starts with:

> So, I went with, “fuckyourwhoremotherinheronegoodear.”

Drivel is "childish talk" (mom insults, anyone?) http://dictionary.reference.com/browse/drivel

As to your other points, I think they are good. For me, intelligent debate does not talk about fucking anyone's whore mother in their ear. If that is what I was looking for, I'd read YouTube comments.

edit: admittedly, tptacek does have a point re: the flag button vs. blacklist.


Kudos indeed. I'm glad Tptacek brought up the often-neglected 'flag' feature.

Again, there's no accounting for taste, but offhand I'd say labeling curse words, and their creative use, as "childish" is quite overreaching. Have you read any David Mamet, Frank McCourt, or Larry McMurtry lately? Do you honestly feel a child would have sufficient skill to structure prose in that fashion?


I never said anything about cursing. I said mom insults were childish. So yeah, if I had said that, that would have been overreaching.


That's why we have the "flag" button. It doesn't blacklist sites that are mean to YC companies.


You forgot related search queries, which is basically bogus search queries that produces even more visits and bogus search queries.


Scribd ceased doing related search queries some time ago. Their CEO described it, to Techcrunch, as "reducing the aggressiveness of our SEO, which reduces total traffic in the near term but increases the relevancy of Scribd links in search engine results."

Scuttlebutt among SEOs whose opinion I respect suggests that it is highly probable they got a backchannel from Google telling them that either they could drastically reduce the footprint of their pages in the SERPs or that Google's search quality team would assist them in doing so.

Anyhow, their traffic went down by about 50% in a month, if you trust Compete et al.

Search for [Scribd "aggressive SEO"] if you want the whole tale.


Directly measured @ quantcast:

http://www.quantcast.com/scribd.com

The big falloff end of June I assume.


They have 5 adsense blocks on their site, blended into the content. Good luck trying this as well if you are a "small" adsense customer!


when you have more than 5 million uniques, you qualify for Adsense Premium, at that point Google pretty much tailor fits the ads for your site


Thanks for the information!


Yeah, 'regular' adsense publishers have a limit of 3 blocks per page.


And Experts Exchange...


At least on Experts Exchange, you can read the question/answers without signing up. It's just well-hidden (and doesn't show up in text-only browsers like w3m or links last I checked). You have to scroll all the way down the page past the looks-like-the-content-but-is-blurred-out-or-otherwise-obfuscated section, and past all of the navigation links. The content is actually there.

I don't necessarily condone their page design and misdirection, but I have found answers on them through Google searches in the past.


It wasn't always that way. They used to only show google it, but if a regular user came along they couldn't see it. Google got a bit mad and what you see is their current hack to appease google, and screw users.


This 'hack' was in place years ago. IIRC, I used to back in 2003 or maybe before then. Maybe at some point they removed it, then Google made them put it back?


It used to be Javascript-obfuscated. For all the times I clicked on one of their links only to have my blood pressure raised: I hope Experts Exchange withers away into obscurity.


When it was blurred out with JavaScript, I always was able to find the actual content further down the page. I think that the 'blurred out' version of it was just to make people give up. Either that or their JavaScript was broken for Firefox.


"we don't do any blackhat... Kind of silly." http://twitter.com/Jason/status/8177377715

Note that if he did do black hat shit, and wanted this to blow over, this is the perfect dismissive response: starve the story.


In for the comment as well. I doubt you'll get a true response because it's quite clear Mahalo is just auto-generated SPAM 99% of the time.


This article makes it clear that Mahalo is in many ways quite similar to another questionably ethical startup, Demand Media. Here's the fascinating Wired article:

http://www.wired.com/magazine/2009/10/ff_demandmedia/all/1


One important difference is that Demand Media actually produces their content themselves.


Right. Demand Media has a distributed, virtualized workforce of freelancers. (Read the Wired article on it. That is some of their best reporting. Ever.) Mahalo used to have in-house editors before they moved to mostly outsourced "editors" before they realized editors cost a lot of money and firing them didn't decrease revenues in the slightest. At the moment their editorial staff is a thin pretense maintained to keep the site from getting bounced out of the index.

Disclaimer: As with most other massive content plays which have large audiences of unsophisticated Internet users, I indirectly subsidize Mahalo through AdSense expenditures. To the tune of probably over a hundred bucks last year, but I don't have my numbers in front of me. Like I mentioned in my blog post earlier today, they send great traffic (i.e. it is cheap and converts well) because my ads are the content on their pages.

That is disquieting to me in some ways. I could ban them and start chopping off heads from the Demand Media hydra in my AdSense account, but that would consume vast amounts of my time and just cost me money.


Right. Mahalo had a call out for 17 or so "interns/volunteers" a few months back... thats who replaced the editorial staff.


Great article!

Just a few days ago I landed on a Maholo page from a google search. My exact thoughts were "where is the content".


Who would have thought that the guy who said (paraphrased) "want to have a life? then work somewhere else!" would be slimy?

I really don't understand how or why Jason Calacanis has any credibility or notoriety today. Point me to Mahalo and I see an utterly worthless spammy waste of a website that I and all of my peers avoid at all costs which was built with exploitative labor practices.


Great article. "the willingness to lie just to get a bit of media ink" very succinctly captures what I most disliked about my experiences amongst the movers and shakers of California.


This pretty much proves once again that gaming search engines is here to stay. There is still a lot of research to be done to make it harder to get away with this type of websites, but luckily there are more and more ways, other than Google, to find the content you are looking for.


You actually don't need all that much authority to get away with ranking scraped content in Google. Despite their FUD, Google's duplicate content detection algorithm seems to be largely non-existent.

For example, check out the Google results for http://hackerne.ws, which is a page-for-page duplicate of news.ycombinator:

http://www.google.com/q=site%3Ahackerne.ws

10,000 pages indexed, not a single word of original content.

Note: I know hackerne.ws is not trying to be spammy, and merely parked the domain improperly. If the owner is reading, all it would take is a simple 301 redirect to fix.


It's a CNAME to news.ycombinator.com:

lucidity% nslookup hackerne.ws

Server: 192.168.0.1

Address: 192.168.0.1#53

Non-authoritative answer:

Name: hackerne.ws

Address: 174.132.225.106

lucidity% nslookup news.ycombinator.com

Server: 192.168.0.1

Address: 192.168.0.1#53

Non-authoritative answer:

Name: news.ycombinator.com

Address: 174.132.225.106

lucidity%



The fact tha the duplicated content is indexed doesn't mean the it ranks better that the original.


Definitely true, but I became aware of this domain using Google to try to surface old threads. You get enough pages in the index, and it becomes a crapshoot on longtail searches.

Here's an example:

http://www.google.com/search?q=%22How+to+compensate+sweat+eq...

In this case not only does the hackerne.ws page outrank the news.yc page, the news.yc has been pushed into the supplemental index.


All it would take to fix is for pg to make news.arc less shitty: actually check the HTTP/1.1 Host header, and respond with your own 301.


Or include a rel=canonical link in the head.


rel=canonical won't work if the domain Google sees you on is different from the one specified as canonical. This is to prevent people capable of content injection from hijacking entire websites in a subtle manner.


Google says otherwise: http://googlewebmastercentral.blogspot.com/2009/12/handling-...

No doubt they must use other indicators to ensure the authoritative source.


Your link didn't work for me, but this

http://www.google.com/search?hl=en&site=q%3Dsite%3Ahacke...

shows 255,000 results, of which hackerne.ws is the first, news.ycombinator.com is second :-|

[edit: luckily, Google's duplicate content detection algorithm didn't work here...]


You searched for the exact domain name of one site. It doesn't surprise me that it came in first.


Found another one, that's just an IP:

174.132.225.106 [http://www.google.com/search?q=site%3A174.132.225.106]

Another exact duplicate of HN, with 770,000 pages in Google's index.


That's the IP for HN!


Nice catch, that almost amuses me more than if it were deliberate. I wonder how that happened.

FWIW, apps.ycombinator.com has the same problem.


Why isn't it copyright infringement for Mahalo to scrape content like claimed in the article? I don't see how either fair use or DMCA safe harbor would apply (but I'm no lawyer). This seems like a lawsuit just waiting to happen.


They're just scraping page titles and occasionally short excerpts; most people believe that's fair use.


i think it's interesting that jason - who manages his online reputation exceptionally well (and speedily) - has yet to comment.


Interesting article. Mahalo won't be the last site to exploit these methods. There are a number of issues here: 1. My boss said something interesting the other day: As Google has already conquered search and is diversifying its business into different areas, it is not paying as much diligence into its search algorithm to weed out those sites that exploit it. Google makes money from AdSense, so why would it be in a big hurry to take down sites that exploit dodgy SEO practices. 2. As for scraping content without any backlinks, the media industry seems to have very little protection when it comes to copyright. Existing copyright law is woefully unable to get to grips with digital copying and display. If the content had been music, or films, the RIAA would have clamped down so fast, Jason's head would be spinning. But we are talking about digital publishing industry, where content has very little protection at all. 3. Even if we decide that taking the first paragraph is fair use, not back linking or citing your source is still a copyright issue (not to mention bad Internet etiquette).


good article, never noticed the the scraped content part




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: