I'm pretty sure Nuance also does the Comcast Xfinity voice remotes. They are actually really well done. And it's a humongous customer to have your claws in.
I have this and it really is quite remarkable. The entire voice integration in the x1 set top box is really polished. Other than the occasional "wow, this is pretty good for comcast!" I hadn't given it much thought until now.
I think they've been at it for a pretty long time with old products like Dragon. A friend used to work there some years back and said they pretty much perfected speech detection up to some very reasonable error rate. I imagine they've just continued to cover all the dark corner cases and irregularities and accents etc.
No, the real challenge now is how do you minimize the amount of processing power needed, or bandwidth needed, and how do you do all this while minimizing translation time to under 0.5 seconds.
Obviously accents and irregularities are also areas I'm sure they are focusing on, but I imagine that optimizing for real time, mobile and low CPU power devices is a huge focus for them.
They ran dragon on contemporary computers, and that was very good at least 10 years ago, probably more. So they have voice recognition working well on what would now be considered very constrained hardware.
I had an early version of Dragon on Tandy running Windows 3.1. And it was pretty usable. Not just dictation it could imitate mouse clicks when you said stuff like "maximize window".
The biggest leg up they have is that their product works in a very controlled domain. Siri is out there trying to be a complete interactive AI human and failing miserably. The Xfinity remote just controls your TV and does it smashingly. I found the Alexa-driven voice control on my FireTV to also be a pleasant experience compared to regular Alexa.
Ironically, Siri (as a virtual assistant) came from SRI's Darpa program "CALO" back in the 2000's. SRI also has a speech recognition platform called EduSpeak, and that is the codebase that was licensed to Nuance for their product.
However, after Siri was sold to Apple, I don't think that they retained the EduSpeak portion for speech-to-text. (I honestly don't know - but it seems to me it did not; I don't think Apple wanted to pay that license fee to SRI).
The TV remote only has to understand TV-related things, like inputs, channels, volume, and program names. Siri and Alexa have to understand everything.