Hacker News new | past | comments | ask | show | jobs | submit login
Goodbye (Crummy) CAPTCHAs. Hello Ad Dollars? (allthingsd.com)
76 points by jkopelman on Sept 20, 2010 | hide | past | favorite | 79 comments



here is the challenge I got when I signed up for the service: http://imgur.com/x5hoN.png

for the solution I put in 'stupid', and it worked

this definitely doesn't solve the security considerations that captchas were designed for.

Update: Ok so it didn't take long to break this thing. These guys have the plain text of the CAPTCHA in the document DOM. It isn't even an image - the CAPTCHA is rendered in javascript. See: http://imgur.com/9VO4J.png

The 'brand' logos are an image, but they are simple to OCR.

So to break this CAPTCHA, simply hook v8 up to your auto-submit bot and interpret the JS that is being returned to you. You can't read it from the client because they serve that IFRAME from a diff domain - so they base their security on the browser x-domain policy. But that is all moot if you are building a bot, or if you build a browser extension that solves these things.


And since this CAPTCHA isn't secure at all, they'll remove the security element altogether, making it a more annoying version of a banner ad.


So you replace the hard to read CAPTCHAs with easy to read ads. Of which presumably there will be a limited number in circulation at any one time. Sounds kinda easy to circumvent...


A longtime standing solution to hard-to-read captchas are easy to read (but hard to process) captchas. E.g. show a picture of an animal or a shape and ask what it is, or even just ask a simple math problem or riddle in writing.

Problem with those are that they need to be constantly updated from a reliable source, or else, once the solution becomes popular, the spammer can bruteforce it in linear time (no matter how high N is, there are only N possible patterns).

This seems to be an attempt at fixing this. Ads are often relatively short lived so by the time the spammer has them bruteforced, it might be out of circulation - and more importantly, there are new ads in. It's an armsrace, and this is a way to pay our troops. Also, it's trivial for advertisers to make many different variations (e.g. one for each sales bulletpoint), so there are many variations in circulation. Since there's often more textual content than the password in the ad, they're not prone to simple OCR, while still easy to comprehend for the user.

Obvious shortcomings are if ads are not so shortlived, and if it's easy to identify and break classes of ads (e.g. if it's yellow and has the IE logo in position X, OCR area Y, done). Also, it's a bit of a dealbreaker if I'm forced to open and visit a website to get the password.


This was the process used by the Microsoft Research project ASSIRA (http://research.microsoft.com/en-us/um/redmond/projects/asir...) which used images of kittens and puppies put up for adoption as a constant fresh source of input data.

This was later, awesomely riffed on by HotCaptcha (http://valleywag.gawker.com/246656/a-face-only-a-bot-could-l...) which pulled HotOrNot data and asked you to select the 3 hot women out of 9. Sadly, the site is down now but I remember trying it and it was remarkably useful and a hell of a lot more fun than word captchas.


Your argument is somewhat flawed from a technical level as Olegk mentioned, but also from a business marketing stand-point.

Specific ads may be in circulation for a short time, or there may be many running concurrently, but it is the message that the advertiser is trying to get across, the tagline, and it would be of most benefit to the advertiser to get the user to associate their brand with a single concept. Using the example in the article, Subaru may want to be associated with "outback", not "four wheel drive" or "comfy" or "sporty". Businesses who try to target too many things or too wide an audience end up not getting their message across.

I'm not saying that what Solve Media is trying to do isn't a great idea. I think it has lots of potential, but they clearly still have more to work on.


You have no idea what you're talking about.

> show a picture of an animal or a shape and ask what it is,

Ok, so let's say you come up with pictures of 20 different animals. A spam script that picks the same answer every time will have a 5% success rate.

> or even just ask a simple math problem or riddle in writing.

Computers are way better at solving most math problems than people.


You have no idea what you're talking about.

Please be careful. It could be that his thoughts are excellent but his communication is flawed. It could also be that he has a great deal of general expertise in the subject but is mistaken in this specific statement.

Thus, a general statement about him could be false is also aggressively ad hominem. You might want to consider focusing on the statement itself rather than the speaker, such as:

"Your suggestion is entirely wrong."

JM2C of course, and it is possible that I don't know what I'm talking about. I am not a psychologist or a logician.


Please read my post. Specifically the second paragraph where exactly this issue is addressed.


Still doesn't make sense.

If you display 10 random animals (or shapes), and ask a user to pick the right one, a dumb spam script would have a 10% success rate.

If you display one image of an animal and give 10 possible answers, a dumb spam script would have a 10% success rate.


So if I have a library of 10s of thousands of images of animals, shapes, things etc. that are all easily recognisable to English speakers -- e.g. cat, dog, house, drum, road, tree, book, horse -- and ask them to write in what is it, AND I constantly update that library and retire pictures that's been used many time -- what is your dumb script's success rate?

What if I combine three pictures in each challenge - e.g. "cat house triangle"?


My spam script would always answer "cat", so among 8 options (cat, dog, house, drum, road, tree, book, horse), I'd get a 12.5% success rate.

Plus you constantly have to update your image library, which a huge pain.

Also, recognizing 10,000 images will take me around one day and less than $1000 with Amazon turk, thus giving me a perfect 100% success rate. After that you would have to completely renew your image database.


sigh .. etc. There would be more than 8 options, many more.

YES, it would be a pain to update the library, which is why I'm commending this particular concept for solving that problem ...


You aren't getting it. The probability is 1/<number of options you present to the user>. If you show the user 100 images and ask them to select one, a bot will have a 1% probability to find the right one, but the user will tell you to get lost.

If you present 10 images (still a stretch), bots will have 10% success rate just answering randomly.

EDIT: Wait, from what I see you mean that the user will have to write "cat" or "dog" or whatever? That's better, yes. Communication, however, is hard, which is why me the GP didn't understand what you meant.


not to mention the fact that the bot will get spotted for entering the same phrase more then a few times, get put on a list and get served the squiggly crap


You have no idea how spam works. A bot isn't just one user trying to enter "cat" repeatedly. Botnets send requests from thousands of different IPs. You wouldn't know which ones are real users, and which ones are bots.


You know, most modern capchas are not solved by bots, but by people in third world countries solving them for pennies. The current going rate is about $1/1000 and there are easy to use captcha solving APIs for any platform. capcha has long provided an illusion of security, nothing more. Any and all captchas will be broken.


Before adding captchas to my stupid personal blog, Akismet was catching ~300 attempted spams a day, and I was manually flagging ~2 a day.

Then I installed reCaptcha, and now Akismet catches ~10 a month, and I've had to manually flag zero.

I'll take that illusion, thanks.


ok, but how many comments are you losing because people don't want to fill out a captcha?

i've used defensio.com for filtering comments on my site with no captcha and rarely ever get a false positive or negative. false negatives are easy to spot, and users can manually override false positives by supplying an email address to get a confirmation link (which gets fed back to defensio as a false positive once clicked).


It's a tradeoff, sure. In my case, I'm perfectly happy to trade a few comments from people who don't want to deal with a captcha for never having to manually deal with spam. YMMV.


There are better ways of reducing comment spam to near zero without resorting to captchas. Here's how I did it:

https://secure.grepular.com/Blocking_Comment_Spam_Using_ModS...

It's still working now.


That's great, pretty much any home-rolled CAPTCHA will perform great, simply because it's not worthwhile for spammers to design an attack. If that, however, was the default anti-spam mechanism on, say, wordpress installs, it'd be automated against in minutes.


Definitely. If more people rolled their own, the spammers would be screwed.


That's one datapoint. The counterpoint is the thousands of forum owners who mistakenly relied on captcha to prevent spam at the expense of other antispam measures, like GeoIP.


I'm not saying captchas are a magic one-step solution to end all spam (hence the mention of Akismet). I'm just disputing the idea that they are "nothing more" than illusory security. Sure, they can be turked, and sure, if someone relies only on captchas they'll probably eventually get screwed. But they do provide some benefit as part of a complete approach against spam, in that they raise the cost of a spam attempt up from zero. Unless you have meaningful levels of traffic, the vast majority of spammers aren't going to bother with you if it costs anything at all to hit you.


I know the founders of reCaptcha. This is easier to prevent than you think. Its also quite rare.


Heh, I know people who are working on breaking reCaptcha. Maybe they should get together and learn from each other.

If you're saying that it's possible to prevent manual, outsourced captcha solving, I would have to strongly disagree. It's similar to the futility of DRM as an antipiracy measure. As long as I can see the captcha, I can pay someone in another country to solve it for me.

I will agree that it's rare- but the most sophisticated and high-volume spammers do it, and they're the ones you have to worry about.

In case people start giving me sideways glances I want to make it clear that I haven't done any blackhat stuff in a very long time, but I'm still interested in the blackhat community from a security researcher's perspective.


I'm not saying that its possible to do it in an automated way. I'm just saying that tools exist to create red flags that would make thing easily verified by a human. They have ways in which they deal with these things. All I'm saying is so far, its been working for them.


You're probably not going to tell me more about these tools, although I would be very curious to learn how they work.

All I have is anecdotal evidence- I know several people who are making their living spamming Craigslist and outsourcing their ReCaptcha solving. Since they can make $XX per post, paying pennies for captcha is a tiny expense. I don't condone this behavior, but be aware- it happens more than you think.

Anyway, probably best to take the discussion to email if you want more...perspective from the other side.


£1/1000 is still more expensive then if they could be done with human input. And they need to be done for every time a captcha is needed, with this scheme you could presumably pay for humans to solve the adds currently in rotation (ten? hundreds? thousands? not more than that) then just remember the results.


I don't think having people solve capchas for money is all that common. Do you have a source or something? Even if it is common, capchas still stop those spamming strategies that rely on each spam being free for the spammer.


I don't know how prevalent this practice is exactly, but I have it on good authority that sites like decaptcher.com are making a mint selling this service.


yeh but you gota look at it from a business standpoint who profits from the outsourcing? why would a website owner pay to outsource the ads to get solved in india just to get blacklisted then lose revenue that would have come in anyway for less


Indeed. Without some form of randomisation (which advertisers probably do not want), this would drastically lower the price of solving a captcha. Because you need to submit each ad to a human solver network only once, and can store the solution.


Yeah, this is exactly what I want - remind the user about some other brand in the exact same signup page where I should actually try hard to win him over.

Not to mention how ridiculously easy it will be to break these limited edition captchas.


Works just fine for a blogger that doesn't care too much whether you complete the captcha or not but likes the few extra bucks he makes on a well-commented blog entry.


Not if the well-commented blog entry is 99% viagra comments.


I spent a minute trying but no, I cannot think of anything that would make me go 'well fuck you' and go elsewhere faster then being expected to type out ad copy like a school child.


Did you go to their signup page? They had one that said "watch this ad clip to reveal the security code" and you had to watch the video to get the code to enter.

Interesting idea, but I would never watch an ad clip to signup for something nor would I want my users to have to.


and yet you will gladly type randomized and difficult to read letters or words?

This is actually attempting to solve two problems at once, and in some ways it could be more interesting to the user if the ads are well targetted.

Why not help the site owner make a bit of $$??? are you really afraid that typing in a bit of ad copy is going to turn you into a mindless drone


I'd rather type random characters than "Safer Browsing" for an IE ad. You have to consider the emotional reaction to these ads.


What about the refresh button?


You've got the nail on the head. Capatcha publishing sites are already weeding out the idiots who will jump through hoops to join and blog comment argument or whatever it is - now they can sell the idiots to the highest bidder. Even if they get more spam, the idiot just earnt them cold hard cash in exchange for the spam, and you can still go back to solving spam the non-capatcha way.

It's evil, but evil in a kind of clinically beautiful way. I'd say it stands a good chance of working in the short term too - like punch they monkey ads made a lot of money for some people back in the day


If the site owner is going to make money of that, then the only reason it is economically feasible to do so is if the brand owner gets at least that from the value of showing me their ad.

The only way to do that is if it works.

So yeah, I am concerned.


Normally, I refrain from writing such bad things about start-ups and people's ideas, unless there is something really awful about it. In this case, it's obvious how terrible this service is, both in terms of how it treats the user (who wants to do you a favor by signing up) as a cow to be milked, and how the implementation of it does very little to solve the actual problem of spam. It's just a greedy and short-sighted idea. I won't write any more about how stupid it is.

What I will note is that it is somewhat inspiring to see that an idea like this can get off the ground. I haven't dug into the details of "Solve Media", but I assume that some people poured a sum of money into this. Maybe there's a sucker born every minute, or maybe confidence in anything internet-related is just that high. Hopefully, both things are true, and I think that's a good thing. I'm not advocating taking suckers' money, but rather believing the following: If your idea is anything better than this, which it likely is, you have a shot at it.


Users creating and using free accounts have costs associated with them. Those costs need to be met in order for the service to continue, either by tying a revenue model to the users directly, via charity, or by subsidizing their use of your resources by using something they contribute, such as content.

Selling user content is not a stable net in proportion to user use of the site; you don't know that the content will continue to sell at a rate keeping pace with user-incurred costs.

Charity is also not a stable net in proportion to user use of the site; nothing directly and reliably ties charity income to user-incurred cost, and nothing can--it's charity.

The only way to meet proportionate cost incurred by user use is by tying a proportionate revenue to user use. Even if you don't charge a user to use the service, even if you only ask that they fill out a text box to use the site, just as they might with reCAPTCHa, it suddenly becomes 'milking the user'.

Sadly, users no longer merely expect services to be free for them, they get offended if the service provider derives any money at all for their activity. (I think I sort of understand the mentality--"Why are they getting money for my work, when I'm not?"--but that logic doesn't hold up if no money was changing hands while they did the work anyway.)

You're also rather vague about what "the actual problem of spam" is. What strikes you as a mere symptom, and what strikes you as a cause? Yes, these peoples' particular implementation of challenge-response is pretty poor. Is that what you were specifically referring to, or did you have something else in mind?


Very good points here.

I failed to remember that captchas are indeed used for other things besides registering as a member to a site. Things like one-time viewing of information (eg: WHOIS), etc, will probably benefit a lot from this. I showed some oversight claiming this service was dumb. It's great for things where it's OK that I get insulted, because I want to see something bad enough anyway.

I was in the mindset of imagining this being on a registration form to become a user of a website. I think the money lost from the amount of users getting turned off by this would be greater than the one-time profit incurred whenever a user registers. That is, unless your site profits from less users. In equation form:

(amt profit per lifetime of user) x (number of users that won't sign up because of this) > (amt of users that do sign up with this) x N [where N is how much you make from this ad captcha].

Notice that the longer you plan on retaining users, the less you should be willing to risk slowing down your sign-ups, unless this ad captcha offers a high enough profit. It is my opinion that for sites seeking long-term relationships with users, that this thing sucks. On the other hand, if you can afford that the user not continue beyond a point, then it's great. That's why it'll work for porn and other seedy crappy sites, and probably why I have low regard for it. I admit this is a foolish mindset.

In regards to "actual problem of spam," I was referring to the answer of the captcha being in the DOM, and the otherwise flawed implementation of it.


It seems to me that they're not attempting to prevent spam, they're merely trying to monetize what has become the standard practice of entering CAPTCHAs. going with something like this it's trading in spam prevention for cash. The legitimate users already expect to see a CAPTCHA, but you're using it for a different purpose


I had a similar idea a few years ago. I thought my idea was original but when I did my research, several companies have already thought of it.

Microsoft http://www.internetnews.com/webcontent/article.php/3836421/M...

Yahoo http://www.faqs.org/patents/app/20090012855

Ad Captcher http://adcaptcher.com/


I know the professor who created CAPTCHAs fairly well (did research with him) and I can say that these captcha's are missing the point of captcha's.

The hard to read word is there to prevent spammers. With easy to read words, you have a limited number of words per a page. In the end, the probability that a program randomly guessing the assortment of words (O(n^2) combinations if we assume order) is actually quite high.


Tried signing up for an account. They eat their own dogfood, for which I'm thankful.

It looks like the advertisers can force you to click through to get the security code. (e.g. Catfish's "Click through to see the security code!") I'll pass.


Funny, I first noticed how they don't eat their own dogfood. They don't have a captcha on their contact form: http://www.solvemedia.com/contact.html. Seems to me like that would be a good place to have one.


They have (or had; not checking again) one on their sign-up form. That they don't have one on their contact page suggests they don't know what CATPCHAs are for...


I don't often hope for entrepreneurs to fail, but I think this is a place where they are actually making the world worse. Not only do they completely fail at what they are trying to do (ie: prevent spam), but they do so in the most possibly annoying way.

I'll gladly contribute my time towards writing browser extensions to hide and automatically solve these things if they take off.


I'll be happy with anything other than today's captcha solution. These are some captchas from my hall of fame:

http://www.twitpic.com/2qbv0e/full


Those CAPTCHAs really aren't that bad. If I understand how ReCAPTCHA works, you only need to get one of the words correct.


You know that, I know that, but if everybody knows that you only have to enter the easier-to-read semi-nonsense word then ReCAPTCHA itself doesn't work so well for its larger purpose.

And if you don't know that, then giving someone a CAPTCHA with Hebrew or a Rorschach inkblot in it when they're trying to buy Yo Gabba Gabba Live tickets for their four-year old is a surreal enough experience to make people think they're living in a situation comedy.


Captchas like these make me wonder if I'm a robot...

I was a big fan of this new captcha idea until the video.


Their sign up page required me to watch a video in order to get the security code. While the code came up a couple of seconds into the video, I still dislike the thought of having to watch and wait for ads to somehow contribute to a website (be it through comments, sign ups, etc).


How about this for a CAPTCHA, a video ad unit where the questions are akin to those old SCENEIT questions?

Ie, what color shirt was the man pumping gas wearing?

Bonus points if you tell the user the question after the video so they have to re-watch.


I hate wasting my time on videos even when they're about things I'm interested in; I'm not going to use a service that forces me to watch one I'm not interested in.


If successful this can be copied easily and successfully by anyone. But I hope it won't succeed. Instead things like openid should be more widespread so I don't have to enter a captcha in the first place.


This sounds a lot like what Vidoop tried to do. Their initial idea was to replace passwords with an authentication mechanism based on selecting a number of pictures that fit into user-chosen categories. A person would select 3-5 images from a grid of 12 with each image belonging to a distinct category. Those who picked the images, in the correct order, corresponding to their chosen categories (selected on sign up to the service) along with their username was deemed authenticated. As far as I know they hoped to make money by selling ads within the categories, so if one of the categories displayed was "pets" you might see ads for petfood for example. They moved into doing captchas at some point too.

Anyway, it seemed like a promising idea, but they folded last year. Good write up from a former employee here: http://factoryjoe.com/blog/2009/06/05/the-fall-of-vidoop/


Vidoop was bought up by Confident Technologies, which is still doing the image-based captcha as well as image-based authentication for Web and mobile: www.ConfidentTechnologies.com. The free MyVidoop service is still around too.


I think the idea is an excellent example of how to think creatively but it would take me a long time to find the ad version of a captcha since I would see it as an ad and ignore it.

In any case I'll be watching their progress with this idea.


The article isn't totally accurate, I tried it out and got a trailer for Devil.

I would probably use this for media views or file downloads, but I wouldn't use it for signups because I think it would decrease conversions.


IE = Browse safer? I don't think so. Anyway, I am not going to leave comments on sites, which wants to turn my attention on useless junk like ads.


Damn, that is both smart and evil (because the user can't ignore these kind of ads).

What can we do to make sure this product doesn't catch on?


If you don't want them to catch on, refuse to sign up to any service that uses them. Simple.


In the long term, ugly, hard to use websites will loose. It can be a very long term in some cases. Just be patient.


Thats the usual suggestion - but not using it isn't enough to kill IE6.

So what can we do that will actually work?


Guys, isn't ycombinator all about helping us become good entrepreneurs? Why tear down a good effort? How is this productive? These guys have serious supporters and cliemts so they have to be on to something


This is not a new idea. http://adcaptcher.com/ is at least one year old and it's already being used on a couple of sites.


When I saw the title of this article I thought it was going to be about removing CAPTCHAs in favor of higher conversion rates . . . even if it means more SPAM.


There is a German startup which has been doing this for quite some time: http://www.captchaad.com/


What's going to happen if I'm asked to fill out a CAPTCHA like one of these while I am browsing with Adblock on?


What's wrong with recaptcha anyways? I actually like helping with translations of book scans.


People are making everything into an ad. Stop creating more junks!


This is not a replacement for captchas.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: