The difference is that people are OK with a human asking for clarification, but ...

a3_nm · on Sept 14, 2016

I'm not sure people expect super-human performance out of Siri. An important difference is that a human who doesn't understand will say so, and ask to repeat the relevant part (or to choose between two alternatives), conversationally; or it will pick an interpretation which is not the intended one but was an understandable misunderstanding.

Contrast this with speech recognition, which will often substitute words that are nonsensical in context, making it look silly from a human perspective...

click170 · on Sept 15, 2016

I think another important difference is that humans won't get stuck in a loop asking you for clarification the same way several times, after 2 or 3 times they'll typically change behaviors. Eg they'll ask you to spell the word or respond with the not-understood word with a questioning tone to signal that they don't understand what that word means.

jobigoud · on Sept 15, 2016

This could be implemented though. Based on the part of the sentence that is understood, figure out most likely words for the missing part and ask a specific question about it to fill the gap.

legolas2412 · on Sept 15, 2016

See, it's not about hard coding such behavior. I would say that it reaches human level of understanding if it automatically learns these ways of solving the problem. Asking relevant questions can be hard coded, but it doesn't equal "understanding" the problem.

I think the chinese room experiment overlooks this part of "understanding"

cptskippy · on Sept 14, 2016

Exactly, when SR has a low confidence level it needs to ask for you to repeat yourself. Not just choose the highest confidence match and hope for the best.

randyrand · on Sept 15, 2016

Siri underlines words its not sure about. Then if you click it, it gives you a menu of potential other candidates.

Seems like a good approach.

tarikjn · on Sept 15, 2016

That's a good start but a probably the wrong interface for it, "non-native" in the context, a command initialized by voice should present the options by voice.

zaroth · on Sept 15, 2016

It's a valid HCI solution to a technical failure mode. Once the software has advanced to the point where the AI is truly conversational, it is a watershed moment.

cptskippy · on Sept 15, 2016

That's fine for dictation but of little use when driving or other eyes free scenarios.

ams6110 · on Sept 15, 2016

Also. When. People. Talk. To. Siri. They. Speak. Very. Distinctly. With. Clear. Separation. Between. Words.

Or that is my observation, anyway. I don't use it myself.

dasboth · on Sept 15, 2016

I bet Siri's great at understanding what William Shatner says.

taneq · on Sept 15, 2016

The important thing here, IMO, is going to be how the system asks for clarification. Hearing the same canned "I'm sorry, I didn't quite get that, can you repeat?" phrase 20 times in a row is annoying. Having the computer say "I'm sorry, what was that last word?" or "I didn't quite catch that, did you want me to call Benny or Betty?" would be far more acceptable.

daveguy · on Sept 15, 2016

Like someone else mentioned, how it makes sense out of words is much more important than a zero error rate.

Understanding rate is less than 10%. If you don't match a keyword it gives a useless web search.

Personally I don't think understanding rate is the whole issue as much as reaction to error (which is partly understanding). You can't say "no that's not what I said" and Siri et al never keep enough context to say "huh? What did you say? Or "I didn't get that last part. can you repeat it?"

It's that errors in understanding or accuracy turn the whole thing into a complete shitshow.

One failure and you might as well pull over and type what you want.

Houshalter · on Sept 14, 2016

Remember this is with low quality sound. It could be much higher under better conditions. Amazon's echo relies on good hardware as much as software, with an array of good mics.

dogma1138 · on Sept 15, 2016

From talking to a few people that do SR it's also considerably easier to do when you know the hardware.

They can cancel out reverb and create very fine tuned waveform profiles for speech.

I think one of the reasons that Siri is slightly better at SR than google is because of the control that Apple has over the hardware.

While Cortana turns sourpuss on me every time I switch headsets.

blazespin · on Sept 15, 2016

No, the err rate is not a big deal. What is a big deal is making sense of the words it actually can hear.

amelius · on Sept 15, 2016

One big problem with Siri is that it has zero sense of humor. That is, imho, what makes people feel tired talking to it. It's like talking to a boring civil servant.