AI Defeats the Hivemind

futuremint · on Dec 22, 2010

Wait... the naive-Bayes was trained on Yelp data? Isn't Yelp data also crowd-sourced information? I may not be thinking about this right, but it seems to me that training the classifier on crowd-sourced data and then comparing that to Mechanical Turk... that in the end you're just comparing the quality of the crowd-sourced data to each other?

JonnieCache · on Dec 22, 2010

In fact you're comparing individually crowdsourced data to massively mechanically aggregated crowdsourced data. When viewed like this the results are not in the least surprising.

nicpottier · on Dec 22, 2010

Making good Turk tasks is a science in of in itself. Figuring out the incentive is the key, and sometimes you have to think a bit out of the box.

We actually used turking at my company for some really nutty stuff, logo generation. Basically we'd give people a URL and ask them to generate a 160x40 logo for it. We had some base rules, like the background had to be solid, have no scaling artifacts etc..

We assigned each logo to five people.

Our reward was essentially this: - anybody who met all the rules, got .25c - the best of all that met the rules, got a 50c bonus

It took a few days for people to get the hang of it, but after that we consistently got excellent results, with some really creative stuff coming back. Yes, we were paying up to $1.50 for the logos, but we weren't using them for every site, only the really popular ones, and having it automated made it worth it. Every day we spent maybe 60 seconds picking the best logo of five submissions for a few dozen sites, everything else was automated.

The product that used these by the way is NewsRoom, a pretty sexy RSS reader available on Android. All the logos you see for sites there were generated by Turkers.

Anyways, finding the right equation for that task took some experimentation, but I was impressed by the results in the end.

bad_user · on Dec 22, 2010

      79 passed. This was an extremely basic multiple choice test.
      It makes one wonder how the other 4,581 were smart enough to
      operate a web browser in the first place.

I stopped reading right there.

As for the question itself, that's simple: people come for the money, and since "Turkers" are paid pennies for those tasks that means they have to do a lot of them; so replying randomly on a test is a no-brainer (I wouldn't even bother to click and type and just write a script).

It's a good thing we've got these magazines reminding us how we are so smart and the rest of the world is so stupid. What would I do without my over-inflated ego?

thorax · on Dec 22, 2010

On my surveys on MT I ask things like "Why did you choose this answer?". I also made a little script library to record the times they enter answers in fields. I throw out any that appear to be scripted or were so rapidly answered as to not have been a real answer.

At the price per HIT I paid (0.08 to 0.15), I only had to throw out 2-3 answers out of ~2000 due to someone trying to reply randomly.

LiveTheDream · on Dec 22, 2010

If you had kept on reading, you would have seen that the article specifically identifies low wages as a likely cause for the low quality.

bad_user · on Dec 23, 2010

Yes, but why keep reading an article that insults people ... the reason for the low accuracy was not the point of my comment.

My own father is "not smart enough" to operate a browser. Lack of English skills don't help him. But he can read French and Russian just fine, he has a Ph.D in his profession and a carrier in politics (former advisor to the prime minister, currently a senator in a eastern-European country).

cullenking · on Dec 22, 2010

If you read the paper, you are correct. It states that after surveying the people taking the test, that they came to the conclusion people just answered, hoping to get access to tasks as fast as possible.

Also, the paper says 1658 passed, but probably only 79 passed with > 90% accuracy?

roqetman · on Dec 22, 2010

I'll bet that there are language issues involved as well (with many of the "Turkers" not having English as a first language.

carbocation · on Dec 22, 2010

They don't mention the price per HIT. If they're paying between $0.01 and $0.05 for these HITs, I'm not surprised by these results.

I looked at the cited paper and did not see the cost, but without the cost I really would not bother interpreting these results. "Machines work for electricity; humans need real money. News at 11."

mattmcknight · on Dec 22, 2010

Who is to say that the mechanical turk-ers aren't AI?

d4nt · on Dec 22, 2010

Now there's an idea. It would be a beautiful irony if, in a few years from now, the mechanical turk API was used as an open platform for AI applications to make money solving difficult problems.

JonnieCache · on Dec 22, 2010

Well according to the article, this would be lucrative to some extent right now. As ever it would be a problem of matching problems to algorithms.

EDIT: maybe we can get the real MTers to do the algorithm/problem matching bit...

Homunculiheaded · on Dec 22, 2010

I've always had a bizarre idea that it would be awesome to apply for some low level data entry job at a low-tech company with no programming staff. Then automate the task and get loads of work done while not actually being at the office, repeat this with several other jobs until the sum of the data entry jobs pay is greater than that of an individual programmer.

But then I realized that, in most offices, work done is meaningless next to number of hours spent in the building ;)

wtracy · on Dec 22, 2010

My dad knew a guy who got fired trying to do what you described.

Still, if you were motivated, you could get paid to sit and browse BoingBoing all day. :-)

JonnieCache · on Dec 23, 2010

>Still, if you were motivated, you could get paid to sit and browse BoingBoing all day.

There are a lot of much easier ways to achieve this. As if you'd want to...

wlievens · on Dec 22, 2010

Do bayesian classifiers dream of electric sheep?

polynomial · on Dec 22, 2010

Actually if they are true bayesians they probably dream of the surrounding landscape, the quality of grass the sheep graze on, seasonal weather conditions, available water quality and probably most important, an entire subset of classifiers related to the competence of the shepherds.

nl · on Dec 22, 2010

Did anyone else read the paper? The summary doesn't seem very correct to me.

From the summary:

The results weren't pretty: in order to find a population of Turkers whose work was passable, the researchers first used Mechanical Turk to administer a test to 4,660 applicants. It was a multiple choice test to determine whether or not a Turker could identify the correct category for a business (Restaurant, Shopping, etc.) and verify, via its official website or by phone, its correct phone number and address.

79 passed. This was an extremely basic multiple choice test. It makes one wonder how the other 4,581 were smart enough to operate a web browser in the first place.

From the paper:

Of the 4,660 workers who took this test, only 1,658 (35.6%) workers earned a passing score, and over 25% of workers answered fewer than half of the questions correctly.

To investigate the high failure rate, we conversed with workers directly on TurkerNation and through private email. Based upon worker’s names and email addresses, we believe that we conversed with a representative sample of workers both inside and outside the United States. We found that the test was not too difﬁcult and that most workers comprehended the questions. We believe that many applicants simply try to gain access to tasks as quickly as possible and do not actually put care into completing the test.

ie, 1658/4660 workers passed this test, NOT 79 (!!)

Then later they describe some additional filtering they put in place to attempt to find the best workers (they tried estimated location and time to complete task). Based on these filters they said: Using a combination of pre-screening and the test tasks described above, only 79 workers of 4,660 applicants qualiﬁed to process real business changes.

john_horton · on Dec 22, 2010

I was at NIPS and talked to one of the authors. I thought the paper was interesting, but I think the "you're not paying enough" critique is spot on. Humans clearly can be better at this task---you just can't give them strong incentives to cut corners on quality, which happens with a low piece-rate and a task that takes on the order of 3 ~ 4 minutes to do properly.

JonnieCache · on Dec 22, 2010

Am I right in thinking that a naive Bayes classifier is beyond "not even the best out there," and is in fact about as simple a learning algorithm as you can get, and straight out of AI 101?

gjm11 · on Dec 22, 2010

Pretty much, yes. (Though that doesn't mean it's not a good technique. Lots of quite effective spam filters are more or less naive-Bayes.)

abeppu · on Dec 23, 2010

They're sometimes a good technique only because some problems are really simple. There are almost no problems where the extreme independence assumptions of naive Bayes create a reasonable likelihood function. The consequence ends up that when it's wrong, it tends to be very very certain that it's right. I think the aphorism that gets passed around is "Naive Bayes classifiers are often in error but never uncertain".

gjm11 · on Dec 23, 2010

Yup. But some problems -- for instance, discriminating between spam and non-spam emails, and keeping up decent discrimination as spammers vary their tactics -- are (1) "really simple" in that sense and (2) apparently quite difficult to solve, given that there basically were no really effective spam filters before naive-Bayes ones came along.

nobody_nowhere · on Dec 22, 2010

We use a modified naive bayes extensively in a commercial application -- from what I understand it's extremely quick to classify, easy to modify/customize, and deals very well with gaps in data. For a lot of applications, things like SVM and WAODE are only minor incremental improvements.

_delirium · on Dec 22, 2010

Partly this is because naive Bayes's unreasonable independence assumptions (which are almost always badly violated) turn out not to actually hurt classification performance in a lot of cases, even in theory, because under a lot of distributions the independence violations basically cancel out: http://www.aaai.org/Papers/FLAIRS/2004/Flairs04-097.pdf

nl · on Dec 22, 2010

Naive Bayes classifiers are simple, (relatively) easy to understand and fairly straight-forward to code.

They are also fairly robust and work well for a wide variety of problem sets.

Other techniques sometimes offer some improvement, but often don't. Generally a Bayesian classifier is a good place to start.

yarapavan · on Dec 22, 2010

The original NIPS 2010 paper that sourced this article is - "Towards Building a High-QualityWorkforce with Mechanical Turk". Available at http://www.cs.umass.edu/~wallach/workshops/nips2010css/paper...

HN submission of the same (13 days ago) here: http://news.ycombinator.com/item?id=1984130

ianferrel · on Dec 22, 2010

Does this indicate that the majority of Turkers are already just simple scripts? Perhaps just not as well adapted to particular problem sets as this custom-built one was.