Are speech recognition systems also paired with vision recognition systems to determine intent? Seems like that would be where research would be headed.
That's an interesting one. Not only for intent, but for getting the WER down. For example, my mother often mumbles but if I'm in front of her seeing her face I understand her perfectly, but if she's out of my sight where I can't read her lips I have trouble understanding her.