Black Hat SEO Case Study: How Mahalo Makes Black Look White

callmeed · on Jan 24, 2010

Here's a weird experience I had: I tweeted about a startup idea of a "woot.com site for travel" http://twitter.com/callmeed/status/1422601143

Later that day, it somehow got turned into a Mahalo question (I didn't submit it). I thought it was interesting that Jason himself commented on it, but it still seemed strange.

Now, when you google "Woot.com for travel" or "Woot for travel", that Mahalo page comes up on position 1 or 3.

NZ_Matt · on Jan 25, 2010

Air New Zealand has a 24 hour limited deal website www.grabaseat.co.nz. It focuses on the New Zealand domestic market and has been very successful.

vaksel · on Jan 24, 2010

that's the beauty of authority sites, you can rank for stuff just by mentioning them once.

javery · on Jan 24, 2010

This is the flaw in how Google does things that competitors need to exploit. Authority should be more topic based then site-wide.

vaksel · on Jan 24, 2010

yeah I remember reading a blog a few days ago, it was a PR8, and the guy did an experiment. Just added a link for something viagra related. Just a single link.

And within a week he managed to get on a front page in Google results.

on Jan 24, 2010

[deleted]

vaksel · on Jan 24, 2010

Page Rank 8

siong1987 · on Jan 24, 2010

Actually, I am more interested on how that idea goes?

callmeed · on Jan 25, 2010

Still just an idea. I've considered applying to YC with it but I can't devote 100% of my time to it.

Plus my contacts in the travel industry are slim. I know some people at 1 boutique hotel chain but that's about it.

jfarmer · on Jan 25, 2010

http://jetsetter.com by Gilt Groupe?

kyro · on Jan 24, 2010

Jesus, that's about as s(c/p)ammy as you can get. Your business is rooted in theft and trickery and deception, Jason.

jfornear · on Jan 25, 2010

I honestly don't understand how this is anything new? Mahalo always has been a sketchy SEO scam that only a shameless self-promoter could pull off...

aristus · on Jan 24, 2010

Jason, any comment?

richcollins · on Jan 25, 2010

Maybe he can also comment on masking ads as user generated content:

http://twitpic.com/z6m8b

vaksel · on Jan 24, 2010

my guess it's "oh shit!", followed by a prayer that Google doesn't do anything.

I don't think he has anything to worry about for that last part, Google is notorious for letting big sites off the hook(remember Target?) + here Google is also making their share of money off adsense

vladmato · on Jan 25, 2010

And Scribd.com, exact same model, scrape content regardless of copyrights, straps adsense on it, makes money.

bonsaitree · on Jan 25, 2010

Agreed. I think Merlin Mann has the best take yet on Scribd: http://tinyurl.com/yzl3mod

blasdel · on Jan 25, 2010

And pg capriciously blacklisted his domain from news.yc in response

tptacek · on Jan 25, 2010

Merlin is blacklisted from Hacker News? Link to discussion about this?

rinich · on Jan 25, 2010

I just tested this by submitting that same Scribd rant. My account sees it as having been submitted, but if I'm not logged in the story doesn't appear at all.

Can I add that Paul's method of making things disappear without telling users is shit? It was a lot of fun realizing my account was dead and I wasn't just going crazy. Same with this story. I almost posted a comment defending him until I thought to test the story logged out.

blasdel · on Jan 26, 2010

Wait, you had to start a new account because he hell-banned unalone?

That's low, even by pg's standards.

ericb · on Jan 25, 2010

You are free to question PG's motives and cry censorship, but to me, that blog post was malicious, tasteless, angry drivel that I really wouldn't want to see on here, whether it was about a YC company or not.

bonsaitree · on Jan 25, 2010

Even a casual search of ScribD content will reveal the motives for Mr. Mann's post. He has every right to be angry. The malice has been well-earned by ScribD's repeated actions with other creator's content for nearly its entire history.

While it's true there's no accounting for taste, I suggest you consult the dictionary on the definition of 'drivel'.

At the end of the day, PG can do, essentially, whatever he likes with the content, routing (or lack there of), and posting permissions on Hacker News. It's HIS site.

It's a shame that ScribD doesn't seem to play by an equivalent set of ownership principles. Namely, hosting and making money off other creator's content without so much as asking their permission; let alone proffering any ad-revenue sharing.

ericb · on Jan 25, 2010

>I suggest you consult the dictionary on the definition of 'drivel'.

I meant drivel. The linked post starts with:

> So, I went with, “fuckyourwhoremotherinheronegoodear.”

Drivel is "childish talk" (mom insults, anyone?) http://dictionary.reference.com/browse/drivel

As to your other points, I think they are good. For me, intelligent debate does not talk about fucking anyone's whore mother in their ear. If that is what I was looking for, I'd read YouTube comments.

edit: admittedly, tptacek does have a point re: the flag button vs. blacklist.

bonsaitree · on Jan 25, 2010

Kudos indeed. I'm glad Tptacek brought up the often-neglected 'flag' feature.

Again, there's no accounting for taste, but offhand I'd say labeling curse words, and their creative use, as "childish" is quite overreaching. Have you read any David Mamet, Frank McCourt, or Larry McMurtry lately? Do you honestly feel a child would have sufficient skill to structure prose in that fashion?

ericb · on Jan 25, 2010

I never said anything about cursing. I said mom insults were childish. So yeah, if I had said that, that would have been overreaching.

tptacek · on Jan 25, 2010

That's why we have the "flag" button. It doesn't blacklist sites that are mean to YC companies.

slig · on Jan 25, 2010

You forgot related search queries, which is basically bogus search queries that produces even more visits and bogus search queries.

patio11 · on Jan 25, 2010

Scribd ceased doing related search queries some time ago. Their CEO described it, to Techcrunch, as "reducing the aggressiveness of our SEO, which reduces total traffic in the near term but increases the relevancy of Scribd links in search engine results."

Scuttlebutt among SEOs whose opinion I respect suggests that it is highly probable they got a backchannel from Google telling them that either they could drastically reduce the footprint of their pages in the SERPs or that Google's search quality team would assist them in doing so.

Anyhow, their traffic went down by about 50% in a month, if you trust Compete et al.

Search for [Scribd "aggressive SEO"] if you want the whole tale.

axod · on Jan 25, 2010

Directly measured @ quantcast:

http://www.quantcast.com/scribd.com

The big falloff end of June I assume.

anApple · on Jan 24, 2010

They have 5 adsense blocks on their site, blended into the content. Good luck trying this as well if you are a "small" adsense customer!

vaksel · on Jan 25, 2010

when you have more than 5 million uniques, you qualify for Adsense Premium, at that point Google pretty much tailor fits the ads for your site

anApple · on Jan 25, 2010

Thanks for the information!

thiele · on Jan 25, 2010

Yeah, 'regular' adsense publishers have a limit of 3 blocks per page.

ohashi · on Jan 24, 2010

And Experts Exchange...

pyre · on Jan 25, 2010

At least on Experts Exchange, you can read the question/answers without signing up. It's just well-hidden (and doesn't show up in text-only browsers like w3m or links last I checked). You have to scroll all the way down the page past the looks-like-the-content-but-is-blurred-out-or-otherwise-obfuscated section, and past all of the navigation links. The content is actually there.

I don't necessarily condone their page design and misdirection, but I have found answers on them through Google searches in the past.

ohashi · on Jan 25, 2010

It wasn't always that way. They used to only show google it, but if a regular user came along they couldn't see it. Google got a bit mad and what you see is their current hack to appease google, and screw users.

pyre · on Jan 25, 2010

This 'hack' was in place years ago. IIRC, I used to back in 2003 or maybe before then. Maybe at some point they removed it, then Google made them put it back?

Luc · on Jan 25, 2010

It used to be Javascript-obfuscated. For all the times I clicked on one of their links only to have my blood pressure raised: I hope Experts Exchange withers away into obscurity.

pyre · on Jan 25, 2010

When it was blurred out with JavaScript, I always was able to find the actual content further down the page. I think that the 'blurred out' version of it was just to make people give up. Either that or their JavaScript was broken for Firefox.

ivankirigin · on Jan 25, 2010

"we don't do any blackhat... Kind of silly." http://twitter.com/Jason/status/8177377715

Note that if he did do black hat shit, and wanted this to blow over, this is the perfect dismissive response: starve the story.

johng · on Jan 25, 2010

In for the comment as well. I doubt you'll get a true response because it's quite clear Mahalo is just auto-generated SPAM 99% of the time.

fuzzmeister · on Jan 24, 2010

This article makes it clear that Mahalo is in many ways quite similar to another questionably ethical startup, Demand Media. Here's the fascinating Wired article:

http://www.wired.com/magazine/2009/10/ff_demandmedia/all/1

atamyrat · on Jan 24, 2010

One important difference is that Demand Media actually produces their content themselves.

patio11 · on Jan 25, 2010

Right. Demand Media has a distributed, virtualized workforce of freelancers. (Read the Wired article on it. That is some of their best reporting. Ever.) Mahalo used to have in-house editors before they moved to mostly outsourced "editors" before they realized editors cost a lot of money and firing them didn't decrease revenues in the slightest. At the moment their editorial staff is a thin pretense maintained to keep the site from getting bounced out of the index.

Disclaimer: As with most other massive content plays which have large audiences of unsophisticated Internet users, I indirectly subsidize Mahalo through AdSense expenditures. To the tune of probably over a hundred bucks last year, but I don't have my numbers in front of me. Like I mentioned in my blog post earlier today, they send great traffic (i.e. it is cheap and converts well) because my ads are the content on their pages.

That is disquieting to me in some ways. I could ban them and start chopping off heads from the Demand Media hydra in my AdSense account, but that would consume vast amounts of my time and just cost me money.

krtl · on Jan 25, 2010

Right. Mahalo had a call out for 17 or so "interns/volunteers" a few months back... thats who replaced the editorial staff.

NZ_Matt · on Jan 24, 2010

Great article!

Just a few days ago I landed on a Maholo page from a google search. My exact thoughts were "where is the content".

jcromartie · on Jan 25, 2010

Who would have thought that the guy who said (paraphrased) "want to have a life? then work somewhere else!" would be slimy?

I really don't understand how or why Jason Calacanis has any credibility or notoriety today. Point me to Mahalo and I see an utterly worthless spammy waste of a website that I and all of my peers avoid at all costs which was built with exploitative labor practices.

ojbyrne · on Jan 24, 2010

Great article. "the willingness to lie just to get a bit of media ink" very succinctly captures what I most disliked about my experiences amongst the movers and shakers of California.

tomh- · on Jan 24, 2010

This pretty much proves once again that gaming search engines is here to stay. There is still a lot of research to be done to make it harder to get away with this type of websites, but luckily there are more and more ways, other than Google, to find the content you are looking for.

qeorge · on Jan 24, 2010

You actually don't need all that much authority to get away with ranking scraped content in Google. Despite their FUD, Google's duplicate content detection algorithm seems to be largely non-existent.

For example, check out the Google results for http://hackerne.ws, which is a page-for-page duplicate of news.ycombinator:

http://www.google.com/q=site%3Ahackerne.ws

10,000 pages indexed, not a single word of original content.

Note: I know hackerne.ws is not trying to be spammy, and merely parked the domain improperly. If the owner is reading, all it would take is a simple 301 redirect to fix.

aditya · on Jan 24, 2010

It's a CNAME to news.ycombinator.com:

lucidity% nslookup hackerne.ws

Server: 192.168.0.1

Address: 192.168.0.1#53

Non-authoritative answer:

Name: hackerne.ws

Address: 174.132.225.106

lucidity% nslookup news.ycombinator.com

Server: 192.168.0.1

Address: 192.168.0.1#53

Non-authoritative answer:

Name: news.ycombinator.com

Address: 174.132.225.106

lucidity%

aditya · on Jan 25, 2010

It was a gift to HN: http://news.ycombinator.com/item?id=84039

slig · on Jan 24, 2010

The fact tha the duplicated content is indexed doesn't mean the it ranks better that the original.

qeorge · on Jan 24, 2010

Definitely true, but I became aware of this domain using Google to try to surface old threads. You get enough pages in the index, and it becomes a crapshoot on longtail searches.

Here's an example:

http://www.google.com/search?q=%22How+to+compensate+sweat+eq...

In this case not only does the hackerne.ws page outrank the news.yc page, the news.yc has been pushed into the supplemental index.

blasdel · on Jan 25, 2010

All it would take to fix is for pg to make news.arc less shitty: actually check the HTTP/1.1 Host header, and respond with your own 301.

dminor · on Jan 25, 2010

Or include a rel=canonical link in the head.

patio11 · on Jan 25, 2010

rel=canonical won't work if the domain Google sees you on is different from the one specified as canonical. This is to prevent people capable of content injection from hijacking entire websites in a subtle manner.

dminor · on Jan 25, 2010

Google says otherwise: http://googlewebmastercentral.blogspot.com/2009/12/handling-...

No doubt they must use other indicators to ensure the authoritative source.

merraksh · on Jan 24, 2010

Your link didn't work for me, but this

http://www.google.com/search?hl=en&site=q%3Dsite%3Ahacke...

shows 255,000 results, of which hackerne.ws is the first, news.ycombinator.com is second :-|

[edit: luckily, Google's duplicate content detection algorithm didn't work here...]

slig · on Jan 24, 2010

You searched for the exact domain name of one site. It doesn't surprise me that it came in first.

qeorge · on Jan 25, 2010

Found another one, that's just an IP:

174.132.225.106 [http://www.google.com/search?q=site%3A174.132.225.106]

Another exact duplicate of HN, with 770,000 pages in Google's index.

aditya · on Jan 25, 2010

That's the IP for HN!

qeorge · on Jan 25, 2010

Nice catch, that almost amuses me more than if it were deliberate. I wonder how that happened.

FWIW, apps.ycombinator.com has the same problem.

whyenot · on Jan 25, 2010

Why isn't it copyright infringement for Mahalo to scrape content like claimed in the article? I don't see how either fair use or DMCA safe harbor would apply (but I'm no lawyer). This seems like a lawsuit just waiting to happen.

wmf · on Jan 25, 2010

They're just scraping page titles and occasionally short excerpts; most people believe that's fair use.

kareemm · on Jan 25, 2010

i think it's interesting that jason - who manages his online reputation exceptionally well (and speedily) - has yet to comment.

monkeygrinder · on Jan 25, 2010

Interesting article. Mahalo won't be the last site to exploit these methods. There are a number of issues here: 1. My boss said something interesting the other day: As Google has already conquered search and is diversifying its business into different areas, it is not paying as much diligence into its search algorithm to weed out those sites that exploit it. Google makes money from AdSense, so why would it be in a big hurry to take down sites that exploit dodgy SEO practices. 2. As for scraping content without any backlinks, the media industry seems to have very little protection when it comes to copyright. Existing copyright law is woefully unable to get to grips with digital copying and display. If the content had been music, or films, the RIAA would have clamped down so fast, Jason's head would be spinning. But we are talking about digital publishing industry, where content has very little protection at all. 3. Even if we decide that taking the first paragraph is fair use, not back linking or citing your source is still a copyright issue (not to mention bad Internet etiquette).

vaksel · on Jan 24, 2010

good article, never noticed the the scraped content part