Later that day, it somehow got turned into a Mahalo question (I didn't submit it). I thought it was interesting that Jason himself commented on it, but it still seemed strange.
Now, when you google "Woot.com for travel" or "Woot for travel", that Mahalo page comes up on position 1 or 3.
yeah I remember reading a blog a few days ago, it was a PR8, and the guy did an experiment. Just added a link for something viagra related. Just a single link.
And within a week he managed to get on a front page in Google results.
my guess it's "oh shit!", followed by a prayer that Google doesn't do anything.
I don't think he has anything to worry about for that last part, Google is notorious for letting big sites off the hook(remember Target?) + here Google is also making their share of money off adsense
I just tested this by submitting that same Scribd rant. My account sees it as having been submitted, but if I'm not logged in the story doesn't appear at all.
Can I add that Paul's method of making things disappear without telling users is shit? It was a lot of fun realizing my account was dead and I wasn't just going crazy. Same with this story. I almost posted a comment defending him until I thought to test the story logged out.
You are free to question PG's motives and cry censorship, but to me, that blog post was malicious, tasteless, angry drivel that I really wouldn't want to see on here, whether it was about a YC company or not.
Even a casual search of ScribD content will reveal the motives for Mr. Mann's post. He has every right to be angry. The malice has been well-earned by ScribD's repeated actions with other creator's content for nearly its entire history.
While it's true there's no accounting for taste, I suggest you consult the dictionary on the definition of 'drivel'.
At the end of the day, PG can do, essentially, whatever he likes with the content, routing (or lack there of), and posting permissions on Hacker News. It's HIS site.
It's a shame that ScribD doesn't seem to play by an equivalent set of ownership principles. Namely, hosting and making money off other creator's content without so much as asking their permission; let alone proffering any ad-revenue sharing.
As to your other points, I think they are good. For me, intelligent debate does not talk about fucking anyone's whore mother in their ear. If that is what I was looking for, I'd read YouTube comments.
edit: admittedly, tptacek does have a point re: the flag button vs. blacklist.
Kudos indeed. I'm glad Tptacek brought up the often-neglected 'flag' feature.
Again, there's no accounting for taste, but offhand I'd say labeling curse words, and their creative use, as "childish" is quite overreaching. Have you read any David Mamet, Frank McCourt, or Larry McMurtry lately? Do you honestly feel a child would have sufficient skill to structure prose in that fashion?
Scribd ceased doing related search queries some time ago. Their CEO described it, to Techcrunch, as "reducing the aggressiveness of our SEO, which reduces total traffic in the near term but increases the relevancy of Scribd links in search engine results."
Scuttlebutt among SEOs whose opinion I respect suggests that it is highly probable they got a backchannel from Google telling them that either they could drastically reduce the footprint of their pages in the SERPs or that Google's search quality team would assist them in doing so.
Anyhow, their traffic went down by about 50% in a month, if you trust Compete et al.
Search for [Scribd "aggressive SEO"] if you want the whole tale.
At least on Experts Exchange, you can read the question/answers without signing up. It's just well-hidden (and doesn't show up in text-only browsers like w3m or links last I checked). You have to scroll all the way down the page past the looks-like-the-content-but-is-blurred-out-or-otherwise-obfuscated section, and past all of the navigation links. The content is actually there.
I don't necessarily condone their page design and misdirection, but I have found answers on them through Google searches in the past.
It wasn't always that way. They used to only show google it, but if a regular user came along they couldn't see it. Google got a bit mad and what you see is their current hack to appease google, and screw users.
This 'hack' was in place years ago. IIRC, I used to back in 2003 or maybe before then. Maybe at some point they removed it, then Google made them put it back?
It used to be Javascript-obfuscated. For all the times I clicked on one of their links only to have my blood pressure raised: I hope Experts Exchange withers away into obscurity.
When it was blurred out with JavaScript, I always was able to find the actual content further down the page. I think that the 'blurred out' version of it was just to make people give up. Either that or their JavaScript was broken for Firefox.
This article makes it clear that Mahalo is in many ways quite similar to another questionably ethical startup, Demand Media. Here's the fascinating Wired article:
Right. Demand Media has a distributed, virtualized workforce of freelancers. (Read the Wired article on it. That is some of their best reporting. Ever.) Mahalo used to have in-house editors before they moved to mostly outsourced "editors" before they realized editors cost a lot of money and firing them didn't decrease revenues in the slightest. At the moment their editorial staff is a thin pretense maintained to keep the site from getting bounced out of the index.
Disclaimer: As with most other massive content plays which have large audiences of unsophisticated Internet users, I indirectly subsidize Mahalo through AdSense expenditures. To the tune of probably over a hundred bucks last year, but I don't have my numbers in front of me. Like I mentioned in my blog post earlier today, they send great traffic (i.e. it is cheap and converts well) because my ads are the content on their pages.
That is disquieting to me in some ways. I could ban them and start chopping off heads from the Demand Media hydra in my AdSense account, but that would consume vast amounts of my time and just cost me money.
Who would have thought that the guy who said (paraphrased) "want to have a life? then work somewhere else!" would be slimy?
I really don't understand how or why Jason Calacanis has any credibility or notoriety today. Point me to Mahalo and I see an utterly worthless spammy waste of a website that I and all of my peers avoid at all costs which was built with exploitative labor practices.
Great article. "the willingness to lie just to get a bit of media ink" very succinctly captures what I most disliked about my experiences amongst the movers and shakers of California.
This pretty much proves once again that gaming search engines is here to stay. There is still a lot of research to be done to make it harder to get away with this type of websites, but luckily there are more and more ways, other than Google, to find the content you are looking for.
You actually don't need all that much authority to get away with ranking scraped content in Google. Despite their FUD, Google's duplicate content detection algorithm seems to be largely non-existent.
For example, check out the Google results for http://hackerne.ws, which is a page-for-page duplicate of news.ycombinator:
10,000 pages indexed, not a single word of original content.
Note: I know hackerne.ws is not trying to be spammy, and merely parked the domain improperly. If the owner is reading, all it would take is a simple 301 redirect to fix.
Definitely true, but I became aware of this domain using Google to try to surface old threads. You get enough pages in the index, and it becomes a crapshoot on longtail searches.
rel=canonical won't work if the domain Google sees you on is different from the one specified as canonical. This is to prevent people capable of content injection from hijacking entire websites in a subtle manner.
Why isn't it copyright infringement for Mahalo to scrape content like claimed in the article? I don't see how either fair use or DMCA safe harbor would apply (but I'm no lawyer). This seems like a lawsuit just waiting to happen.
Interesting article. Mahalo won't be the last site to exploit these methods. There are a number of issues here:
1. My boss said something interesting the other day: As Google has already conquered search and is diversifying its business into different areas, it is not paying as much diligence into its search algorithm to weed out those sites that exploit it. Google makes money from AdSense, so why would it be in a big hurry to take down sites that exploit dodgy SEO practices.
2. As for scraping content without any backlinks, the media industry seems to have very little protection when it comes to copyright. Existing copyright law is woefully unable to get to grips with digital copying and display. If the content had been music, or films, the RIAA would have clamped down so fast, Jason's head would be spinning. But we are talking about digital publishing industry, where content has very little protection at all.
3. Even if we decide that taking the first paragraph is fair use, not back linking or citing your source is still a copyright issue (not to mention bad Internet etiquette).
Later that day, it somehow got turned into a Mahalo question (I didn't submit it). I thought it was interesting that Jason himself commented on it, but it still seemed strange.
Now, when you google "Woot.com for travel" or "Woot for travel", that Mahalo page comes up on position 1 or 3.