Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: What's with all the trivial link spam?
28 points by cperciva on March 7, 2009 | hide | past | favorite | 59 comments
I've noticed an increasing trend over the past month: New accounts are created and immediately used to submit a trivial link -- e.g., "Mozilla Firefox Start Page", "Orkut - home", "Google", "Gmail", "Hacker News | Submit", etc.

Can anyone come up with a plausible explanation for this? It doesn't make sense as traditional spam, since the pages in question aren't selling or promoting anything; nor does it make sense as accidental bookmarklet clicks, since (I assume) the bookmarklet doesn't create an account immediately before submitting the page.

For now I'm just flagging such content-free links, but I'd love to understand what's going on here.




On my blog I'm getting comment spam without any links as well. The usual stuff like "Great work! Plan to come back often!" but there's no link to Joe's Sex Shop or anything -- just a bogus email address and fake name.

I used to think it was just poor comments, but it only happens on certain pages, which aren't at the top of my most popular list.

My theory is that the bots are working in waves -- first identify easy targets, then slowly exploit them. I wouldn't be surprised if the botnet guys are using Mechanical Turk-type operations just to work the CAPTCHAs and get valid logins for later use.

It used to be you could make generalizations about the bot wars. Now, with possibly millions of programmers out there with nothing to do but try to work the system? The complexity and nuance of attacks are several orders of magnitude greater than just five years ago. I'm not sure any kind of sweeping statement catches what's going on, except: there are a lot of entities on the web that are looking for doors -- any kind of doors. Once they find them, it may be months or years before they are ever used.


One of the ways Askimet works is shared-secret-key authentication using your email address. Notice how sites collect your email address when you submit a comment but they don't usually publish it. However, they do send your email address to Askimet. As a result, we can get a fairly good approximation of whether a comment is spam or not simply by checking to see if the email address was used for comments previously flagged as spam or ham. The more non-spam comments--and especially the more ham comments--that the email address has in the corpus, the less likely it is that the new message is spam.

Now, imagine I am a spammer. I need to submit some automated comments that don't look like spam so that my spammy comments wll later get accepted. There are a variety of ways I can do this. The easiest is to give the user a generic compliment like "Wow, this was a great blog post! Thanks for publishing it." The comment doesn't add to the discussion but the blog owner is often reluctant to delete it because (a) it strokes his ego, (b) there is no way for him to tell if it is automatically generated or not. In fact, if his spam filter preemptively classified it as spam, he might even override the filter and reclassify it as ham; this would be a huge win for the spammer-to-be. Another way to do this is to automatically scrape a comment from another discussion about the page like Reddit, Digg, Twitter, or FriendFeed and submit it as a blog comment on the original page. A third way is to pay somebody a (very) small amount of money to read the blog post and write a comment (possibly using some fill-in-the-blanks response template system like those used in call centers.)

It works the other way too. If I want your comments to start getting flagged as spam then I should start submitting spam-like comments in your name. Then, users will start classifying my forged spam comments as spam and the automatic classifiers will start automatically classifying your valid comments as spam.


I forgot to add the corollary. Much like computers will soon be able to solve CAPTCHAs better than people, automated spamming systems will eventually be able to generate comments that are as good or better than human-submitted comments. In other words, we can expect comments that are generated by spammers to eventually become valuable enough that we want to retain them instead of filtering them away. That isn't the reality today but it could be soon.

You might have a discussion in your blog's comments and then realize that everybody in the conversation is a robot except you. And you will learn something from those robots and/or lose an argument with them, when they get really good.

Eventually, you will be able to post a rough draft of your blog post, and the spam robots will start up a good conversation about it in the comments. Then, based on their feedback, you can revise it into the final draft (if the robots haven't already given you a link to a blog post that refuted your point or made it better than you can).

As a final step, we might notice that not only are the best comments are written by spam robots, but the best blogs are spam blogs too. Right now, people are blogging-for-adsense manually; in the future, Google will be able to blog-for-adsense itself, cutting out the authors.

Today, if a site wants its user-generated content (comments in this case) to retain its value, it needs to start filtering out the low-value content, regardless of whether it was generated by a spammer or a well-intentioned user. For example, If I add a comment to your blog "Great blog post!" on your blog, you should probably delete it, even if you know I am not a spammer. Unfortunately, if you do that, legitimate users who see their complimentary comments get deleted might react negatively (e.g. stop commenting or complain loudly, generating even worse comments). So, we can see that the spammers who are generating low-value comments already have the ability to drive away your valuable commentators if those commentators occasionally submit low-value comments.


This looks so scary.

we can expect comments that are generated by spammers to eventually become valuable enough that we want to retain them instead of filtering them away

What is the point? Why spammers would submit a valuable comment? Just to be kind of a door for them to enter through later, and nobody will think it's a spam?


When you submit a comment to a blog, you usually get to include a link to a website along with the comment. Usually the commentator's name is linked to that website in the resulting comment page. If my spam comment is good, a very small number of people will click my username and go to the site I choose. The more comments I get published, the more clicks I get on my link. If that is a spam link, I get paid for every one of those clicks.

Or, a spammer might generate some good comments just to get the comment classifier to mis-classify a blatantly spammy comment that will be submitted later.


Call me crazy, but if someone wants to give me a quality comment I don't mind having their name link to their (legal and non-offensive) referral link in return.


You are more than a genius!

Do you have any publications or a blog? I am very interested in learning more from you... you look like a scientist!

EDIT: But there is a point here that you did not really mention! You assumed that spammers will be smarter, but spam filters will still stupid? Don't you think that as spammers build a smarter bots, the spam-fighters will build a smarter spam filters?


See, this was a very nice demonstration. How am I supposed to tell if thepanister is a robot or not? Based on the content of this message it could go either way.


LOOOL But I edited it to look a man-made. :)

Please re-read it again.


I have been meaning to make a blog but all my computing time has been spent building some software that I will release soon. Email me (brian@briansmith.org) and I will send you a link when it is ready. Include what kind of phone you have (e.g. Nokia 1100, Android) and I will send you a free copy of the software if it works on your phone.

(This is an open offer to anybody: if you are the first person with your model of phone to email me, I will give you a free license if/when it works on your phone. Also notice how spammy this comment is.)


Can we be sure you're not a robot?

That's kind of a joke, but kind of serious too -- the entire idea of CAPTCHA is going to have to evolve in quite meaningful ways.

I liked your thesis, as speculative as it was. My general impression is that we're talking decades here, not years. Predicting out that far is always tricky. One can imagine receiving phone calls from friends -- the friends being electronically generated voice impressions of our real friends that try to sell us things. Once you start down this path of faking a person, you're going to end up in some very weird places. For instance, I could create "fake mes" that would interact on the web as well, creating blogs, commenting on articles, documenting a presence -- all for the purpose of leaving a bad trail for spambots to follow.


When the spammers write comments that are better than people's comments, why would you want to filter out the spammers' comments? Wouldn't you want to filter out the people's comments instead?

Imagine a time where we have a CAPTCHA that is impossible for people to solve, so that we only allow computers to submit comments.


When the spammers write comments that are better than people's comments, why would you want to filter out the spammers' comments?

Because it's spam!

In fact, the standard that any spam filter should rely on is: Is this spam or not? NOT is this good or not?

So even if the comment is really good and useful, then at least remove the link and keep the comment itself!

It could be a serious mistake if a spam-filter relies on "how much good this comment is", and this is another problem.


>>> "spammers will be smarter, but spam filters will still stupid?"

You make money out of being a spam, you don't make money out of stopping a spammer (generally, or at least nowhere near as much). Technology costs, hence spammers have a fiscal advantage in the spam wars.


EDIT: In fact corporates are ready to pay to fight spam!

Google bought postini and paid for it!

But I am sure that some hackers will be interested in fighting it and making a smarter spam filters, just the same as some hackers here discuss the issue here, and they won't get any money!


It seems to work this way in the gmail spam filter too, but I didn't want to tell anyone for fear of it being exploited. Apparently the cat's out of the bag, however.


On my blog I'm getting comment spam without any links as well. The usual stuff like "Great work! Plan to come back often!"

I don't have a blog myself (shocking, I know), but I have read that some blogs flag the mail adresses of previously accepted comments. The next time someone posts a comment, it will be accepted automatically.

Could it be that some spammers use this approach to save accounts for future spamming?


I actually left a comment on your blog a while ago.

You submitted an article here which I read and found to be amazingly good. So I wanted to leave a short comment saying just that. I had to retry a few times, and got some pretty weird errors. I tried changing e-mail adress, subject, etc. to see if the error would go away. So your problem might be that there are a lot of errors on the submission process, so people try again, often using different names, etc.


Thanks.

Will look into that.

I like your idea about newbies trying out the system. Not as exciting as evil overlord botnets, but perhaps just as plausible. I guess the answers would be in a careful digging through the server logs.

Appreciate your trying. I just upgraded to MT 4.2, so hopefully the comment submission bugs have been worked out.


No worries.

Appreciate your trying - I only did it because the article you wrote was really great :-) It's this one: http://www.whattofix.com/blog/archives/2009/02/technology_is...


At least the default for wordpress is that if a user has had one comment approved then further comments are approved automatically. I suspect that's the motivation for a first wave of pseudo-spam followed by the real influx.


Interesting. Could it be that it's proof of account existence, and that's how someone gets paid?


Why do you think this is spam? I often use a bogus email address and fake name for a service I don't intend to use frequently, like blog comments.


It's not just testing the system. At least one popular blog anti-spam system forces comment to moderation only on the first comment from each person. Subsequent ones go right through.

I was seeing spam like this over a year ago across a network of sites all with the same set of messages (e.g. "Good site! Thanks!")


And I guess moderators are reassured by the nice (though valueless) words... and permit the comment.

If your theory is right it would suggest that moderators set a higher standard (e.g., actual content to the comment) for what they permit?


I've been wondering too. My current theory is that it's spammers testing whether submission is unmoderated.


I really do like the way the system works here -- I'd much rather have the community moderate than an vicious automated anti-spam script. Lately, however, I think the latter is beginning to seem more practical. I have a much harder time scanning the "new" section with the recent growing influx of junk.

Is there a solution to this in the works?


Hello, I had some doubts that you might think of it like: "If I fight their spam entries, then they will increase the flow". Am I right?

EDIT: Are you worried about the server? If you make a spam filter algorithm, then this is more likely to increase the load on the server, which increases the latency... and as a result, increases the headache?


Like DanielBMarkham said, the spammers are likely just looking for easy targets. So my answer to you would be "probably not."

But if we assume that they will take HN at any cost, then their course of action will match that of bacteria: attack, mutate, rinse, repeat. Eventually it will be either them or us that survives. And it probably won't be us.


More than anything else this is what will probably drive true artificial intelligence: the need for the internet to survive as a valid information and commercial platform. We're clearly setting up a prey-predator situation which will continue for as far as the eye can see.


My theory would be that it is a new spam bot, being tested out by its creator. Just a theory, of course. It could also be a user with a bone to pick with HN; their blog posts consistently got deaded by moderators, perhaps, and now they're angrily trying to prove something. Maybe they're doing it with a spam bot built for the purpose...this is Hacker News, so building such a spam bot would be trivial for most of the audience here. A few lines of Python/Perl/Ruby would do the trick.


I've noticed HN appear in a few "spammer lists" over the last month or so.

Usually the price to spam into here is very very high because it is, by comparison, quite difficult.

Im tempted to agree with you - it is a spammer going after the good money testing a new bot.


it is, by comparison, quite difficult

Merely difficult? Isn't it impossible? Has a spam ever survived here?


Borderline stuff can last a few hours. I've occasionally seen linkjacked stuff that is several hours old. The story being jacked is interesting, and so it may even have upvotes. Obviously, once a moderator spots it, the link gets changed to the original story...but it'd be hard to spot in a quick peruse of the New page, so I can understand how such links could live so long.

Anyway, it depends on your definition of "survived" and whether there is value for the spammer in their story living on the new page for minutes or hours. Our company forums get an onslaught of spam about once per week, and though the spam never lasts more than a few hours (far less these days, as I've added a couple of moderators), it seems to be the same spammer doing it over and over again, so they must consider it worth a shot. When your employees time is almost free (or the work is done by a bot), pretty much any result is a worthwhile result.


I should be more precise. What I really meant was, has a spam ever made it out of the holding pen of the new page and onto the frontpage? The new page only gets a fraction of the traffic of the frontpage.

I'm pretty sure the fact that spammers do something doesn't automatically mean it's worthwhile. There are some spammers whose stuff has been autokilled for months, but who keep submitting. They can't possibly be measuring the traffic they're getting.


I think your right.

The last request for spamming HN I came across was worth about $1,500 and up for a front page spot.That's a lot for a single link.


Hmm, maybe we have a business model here. I could easily enough make such spams not appear to users with over some threshold of karma, which the spammers presumably wouldn't have. Can you point me to the page with the offer?


Not prepared to post it here (for obvious reasons)

I will email it though (you will need to register on a forum and make a few posts I think :) been a while since I joined) if you wish. Which email goes direct to you?


Well no response... if you are interested I have a draft email composed. I'll send it to <your username> @ ycombinator.com later today if I dont hear anything...

Though I suspect you were attempting to call a bluff ;) oops...


You are very smart guy, but I feel - just feelings - that you make this offer, because you don't believe it?


How about banning links in comments until a small karma level, say 10, or a few days registration? Another possibility might be deletion privileges for questionable comments for hackers above a certain karma or registration time. By questionable, I mean comments of a certain form from new users, rather than being completely subjective.


Allowing users to:

    only comment without links up to 10 karma, 
    submit at 20 karma, 
    comment with links at 50 karma, etc. 
sounds like a pretty reasonable approach.

Initially one proves themselves in discussion, then they can bring new ideas to the table. By setting the limits at relatively low levels it doesn't discourage new participants.

As a relatively new member of the community, I believe that this is important because you don't want the community to become static any more than you want to get overrun a la digg, reddit, etc.

A few days registration won't work, though, because the bad guys will just start setting up accounts, keeping them dormant for the waiting period, and then do their dastardly deeds. Besides, karma is a better measure than seniority.


EDIT: You should realize something, that spammers might create 10 accounts. 1 account that makes a comment, and the other 9 accounts would vote up for the other account's comments, to increase the karma and pass the karma threshold.

This is similar to what I wrote here: http://news.ycombinator.com/item?id=506028 Users should have a history record.

But it won't be an effecient solution, according to briansmith's approach.

Spammers could hire users to comment. Here is how I imagine it: 1- Spammers crawl the submitted article.

2- Ask real users to read it and comment on it.

3- Copy what users said and submit it automatically here.

4- You will think it's NOT a spam, and you will up-vote what the spammer commented, and this will allow the spammer to submit content and the problem won't be really solved/ but reduced.


Your suggestions seem reasonable... if it gets to the point of an all-out War On Spam-bots. Unless things get that bad, it may be sufficient simply to make it a little more difficult for the spammers (so that they go hit on other sites). Meanwhile, I doubt that many spammers are going to go to trouble to create the mutually-supportive accounts (and certainly not actually spending money by hiring live users).


About hiring live users... it will be a slave hiring, without paying money.

Here is how it works, just one of these 2 options:

1- Spammers automatically create a blog, crawl the submitted articles here to their blog, and wait for a real user to comment on it, and then submit that comment here.

Or: 2- Spammers would create a bot that crawls the submitted article, submit that article to any public social news website, and wait for real users' comments, and then automatically submit the real users' comments here, and you will think it is NOT a bot/spam, but it is.

EDIT: This is something that can be implemented these days, not after a decade!

If this happens, will you think that this is a bot or not?


If one adopts the concept of many different karma thresholds, then it becomes trivial to say that only users with X karma can upvote. Set it to some modest level, and when they hit X karma, new legitimate users will feel rewarded for their contributions!

I agree that there will always be gaming strategies. I just think that having gradually increasing rights/priveleges at quick incremental karma steps is a good way to combat spammers.


The answer might be quite boring.

When a new user comes to this site he will try out the functionality, click around and see how things work. One of the things that he will probably want to know is how to submit an article. So he goes ahead and tries it, using whatever link he has handy.

I know this because I did it myself when I joined. I deleted it right away though. I've seen the same thing on other sites, so it's not that unusual.

Edit: tuned down the wording a bit, since PG's reply indicates that this might not be the sole reason.


That doesn't explain the sudden sharp increase in these links. We're growing, but the growth rate didn't suddenly increase 10x like the rate of these links has.


Maybe it's because the influx of new users are more casual in their use of the site, now that it has grown to a certain size. I could imagine that users that just stumble upon the site will be more willing to engage in this semi-destructible behaviour than users that got the site recommended from you or another early user.

Can you see whether the users that do this end up being good citizens? That would shed some light on whether these are malicious accounts or just new users trying things out.


If you think about how you'd build a script to spam Hacker News, it's easy to see what's going on here.

  - Step one is getting your bot to reliably create accounts.
  - Step two is getting it to create accounts and post links.
  - Step three is feeding it a list of 5000 of your sites.
This bot appears to be at step two.


Perhaps the spammers are testing to see if the spam "sticks". The sites where the spams stays will be revisited. Perhaps that spammers think that an account that posts links that don't come up on a black list will considered "safer" later when they can then spam seriously.


I have already talked about this problem, but nobody really cares!

Even I provided a significant solution to solve the problem: http://news.ycombinator.com/item?id=506028

Guess what? I got down-voted on my comment!

EDIT: If pg has no time for it, then why does not he allow us to code a solution, and he can review it? And if he likes it, then he would use it!


You sure are excitable!


I am sorry, I don't understand what do you mean? Please forgive me, I am not a native english speaker.


He means the exclamation points. In English you don't use those except when you really want to draw attention to a sentence. Otherwise you seem crazy! See what I mean!


WoW! I feel there is a culture shock here. Even I had to look up the word "exclamation"; I really still have a too long road in learning english. :(


Hey... I give you lots of credit for being open to learning... and for continuing to try to contribute to this English-based forum. Live & learn: that's okay.


I don't want to surprise you that I did not really receive any English education. I am self learner.

Your words made me feel really great.


FWIW, your original post did not give away your non-native status.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: