Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That's very technically impressive, but 6% is still a long way from where it needs to be as anything like a primary interface. Even 1% is pretty high when you consider how many words you can utter in a few minutes, and how many errors that would generate.

Edit: For comparison: http://www.utdallas.edu/~assmann/hcs6367/lippmann97.pdf



The human error rate on this task is 4% (https://pdfs.semanticscholar.org/387e/7349b8e31e316c2a738060...). So you are basically saying that telephones are a useless interface...

What is amazing is how consistently people overestimate human performance.


Happens all the time with autonomous driving. I've heard so many people argue that self-driving cars need to be 100% perfect, but they always overlook that humans aren't anywhere near 100% perfect drivers.

Frankly, we're lousy drivers.


Sorry, what is this based on? I for one don't like the idea of self-driving cars and that has nothing to do with wanting them to be "100% perfect", or forgetting that human drivers are not safe either.

On the one hand, there are very good ethical arguments, for example about who bears the moral responsibility when a self-driving car is involved in a fatal accident.

Further, there is a great risk that self-driving cars will become available long before they are advanced enough to be less of a risk than humans, exactly because people may implicitly trust an automated system to be less error-prone than a human, which is not currently the state of the art.


>Sorry, what is this based on?

Conversations I've had where people have told me that self-driving cars will need to be 100% perfect before they should be used. Ironically, one of those people was an ex-gf of mine who caused two car accidents because she was putting on makeup while driving.

Anyway, based on Google's extensive test results, I'm pretty sure self-driving cars are already advanced enough to be less of a risk than humans. Right now, the sensors seem to be the limiting the factor.


Try looking for results where the Google car is driving off-road. There aren't any. That's because it can't drive off-road. It can't, because it needs its environment to be fully mapped. In other words: it's not about the sensors.

This should make, er, sense. Sensing your surroundings is only the first step in taking complex decisions based on those surroundings. The AI field as a whole has not yet solved this, so there's no reason to expect that self-driving cars have.

Seen another way, if self-driving cars could really drive themselves at least as good as humans drive them, we wouldn't have compilations of videos of robots falling off while trying to turn door knobs, on youtube.

The state of the art in AI is such that self-driving cars are not yet less dangerous than humans.

>> an ex-gf of mine who caused two car accidents because she was putting on makeup while driving.

Honestly.


>The state of the art in AI is such that self-driving cars are not yet less dangerous than humans.

Google's self-driving car accident statistics say otherwise.

>Honestly.

Yeah. Weirdest part was, she actually thought she was a GOOD driver. Mostly because of all the times she was able to apply makeup while driving and didn't cause an accident.


>> Google's self-driving car accident statistics say otherwise.

That's an experiment that's been running in a tiny part of one state in one country for a very limited time. I wouldn't count on them and in any case, see what I say above: the state of the art is not there yet, for fully autonomous driving better than humans'.

>> Yeah. Weirdest part was, she actually thought she was a GOOD driver. Mostly because of all the times she was able to apply makeup while driving and didn't cause an accident.

:snorts coffee:


>That's an experiment that's been running in a tiny part of one state in one country for a very limited time.

It's driven over 1 million miles. That's the equivalent of 75 years of driving for the average human. Plenty of data to draw a conclusion from. In all that time, it's been responsible for a single accident. That's way better than human drivers.


Hours driven are one dimension, that is indeed important. However, there is also the geographical aspect that I point out and that may be more important in practice. I mentioned off-road driving. There's also driving in busy roads. The Google car project has not driven for 1 million miles in a busy city, like NY or SF or whatever, neither in heavy traffic conditions.

Then there's the fact that human drivers have to drive in all sorts of weather conditions with all sorts of different vehicles and so on. Google car- not so much.

But my point is very simple: AI in general is nowhere near producing autonomous robots, yet. Why would Google car (or a similar project) be the exception? What makes cars and driving so different that autonomy is easier to attain?


I agree with your general sentiment, but think those error rates hide a lot as well. Human error might be at x% overall, but when you eliminate malfunctioning humans, broadly defined, it's probably much lower than x%.

The recent death of the Tesla owner, for example, as far as I know, was due to the vehicle accelerating into a semi. This is something that most people would not do even in their worst driving state unless they were intoxicated or seriously mentally impaired. I don't want AI driving errors to be compared to human benchmarks that include people who are seriously intoxicated.

A lot of speech frustration problems, similarly, are not only about poor recognition in general, or lack of appropriate prompting to increase classification certainty, but recognition failures in situations where a human would not have any trouble at all, such as in recognizing names of loved ones, or things that would be clear in context to a human. I.e., maybe humans listening to speech corpora would have x% error rate, but that's strangers listening to the corpora. The real question is, if I listen to a recording of my spouse or coworker having a conversation what's the error rate there?

So, although humans are far from perfect, which is something that's often forgotten, the true AI target is also probably not "humans broadly defined" but rather "functional humans" or something like that. AI research often sets the bar misleadingly low because it's so hard to reach as it is.


The types of mistakes a human and a car are prone to making are different. Neither one has to be a superset of the other. For example, cars are probably better at going round corners at a safe speed while humans can easily misjudge and end up skidding. You could make the opposite argument and say only the most malfunctioning self driving car would choose the wrong speed for a corner yet humans make that error all the time, so humans are even worse than the worst self-driving cars.

Another example. If a self driving car is hit by another car that's running a red light while speeding, we might be more forgiving and say "well nobody could have avoided that accident" but actually we'd be being too soft on the self driving car since it has access to more data and faster reaction times and should probably be expected to avoid that type of crash even when a human can't.


Sorry, but did you get the Tesla story right? The Tesla driver was not paying attention, and the car drove at constant speed into an obstacle of the sort that it is known to not be able to see. I know people are posting all kinds of things on the Internet and HN about this accident, but that doesn't make it true. If an actual self-driving car did the same thing, you'd have a great example. But not this one.


On the flip side a large percentage of human drivers is unlikely to suddenly run into the nearest stationary object because of the botched software update.


True, but they often run into the nearest stationary object because they're texting, or eating, or putting on makeup (an ex-gf of mine caused TWO accidents that way), or arguing with a passenger, or speeding, or driving tired after a long day of work, or driving after having a few beers but I'm totally fine I swear I'm cool to drive home, etc...

You're absolutely right that there are risks. But honestly, I suspect drunk drivers alone cause more fatal accidents than autonomous cars ever could.


That's the point though... people who want 100% perfect automation have no leg to stand on when presented with a more reliable option than a human. That's not the case with speech recognition yet, and I'd say as a result people would be well justified in demanding at least as much capability as they're capable of themselves.


Really? I expect there to be huge resistance to driverless cars (at first) as people come up with crazy scenarios in which they're certain that their own elite driving skills could save them but the car could not. Then there'll be a reversal as people start to actually accept that the car is a better driver than they are, and surprisingly quickly they'll start saying anyone who wants to drive manually is a dangerous egotist.


Maybe, but the "upside" of humans being such terrible drivers is that we've almost all either gotten into a bad accident, or know someone close who has. The fact that it's not going to be hard to empirically prove a difference between machine reliability in this case, and human reliability may help as well.

Unfortunately... I think the big issue is going to be pure anxiety; a bigger and more immediate form of what a lot of people experience on an airplane. Giving up even the illusion of control is supremely hard for us as a species, in general. Then there's just the fact that as a species we're terrible at risk assessment.

https://www.schneier.com/blog/archives/2006/11/perceived_ris...


http://www.utdallas.edu/~assmann/hcs6367/lippmann97.pdf

I did say "primary interface" as well, which definitely rules out a mediocre phone connection.


Does using speech multimodally with other input methods (touch, gesture, pen, clicker, game controller, even keyboard - may seem silly but speech is potentially faster at some tasks) still count as "primary" if the other input method is used supplementarily to help disambiguate?


I'd say, and let be clear that I know and accept this to be a relatively arbitrary distinction on my part, that for an input to be "primary" it needs to be able to stand alone (touchscreen input, keyboard and mouse, etc). That's not to say that it must stand alone, but that when all other options are off the table, that would be your preferred method for text/data entry.

That's a troublesome definition though at least in part for reasons you've brought up or alluded to, which is that a multimodal approach is pretty clearly going to dominate. That said, speech recognition at least stands to replace the keyboard for say, the author of a book or article, if it's good enough.


Compared to voice over internet? Yeah, it's pretty terrible.


Recognizing the words is just the first step. Getting meaning out of those words is what really counts. When on a telephone call I may miss a percentage of what was said, due to poor audio quality, the speaker's accent, etc., but based on context I can still understand the message the speaker is trying to convey most of the time. Transcribing each word with near-perfect accuracy is unnecessary if the layers above that can handle it.


That's true, but I wonder if computers will exceed out technical accuracy first, or begin to actually "understand" things first? I suspect the former.


They don't need to actually understand - whatever that means - they can apply statistical models based on phonetic distances and large corpora of dialogue.


Has that worked so far?


Google voice search will make several guesses as to what I said. It will sometimes enter a nonsensical, but similar sounding phrase to what I intended into the text box, but will then figure out that it's first guess is nonsense and return the correct search result.

So just based on observation I would say it works pretty good. On mobile I almost always use voice search, and unless I'm searching for an italian name or something weird it almost always hears me correctly. Even in a noisy pub.


That's a good point, I've rarely had issues with Google voice searching.


Of course it has. We went down from 40% to 6.9% error rate in 26 years. It may take a couple of decades to get to 0.1%.


How does it compare to the typical human error rate?


Best I can find is section four here: http://www.utdallas.edu/~assmann/hcs6367/lippmann97.pdf

It seems to be significantly less than 1% - 1.6%


No you looked at the wrong figure. It's figure 7 (Switchboard) and the WER for humans is 4%.


No, if you look that's only the CC component of the test, with and without context; not the whole test.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: