This is another area where I feel like the good answer is not necessarily the te...

This is another area where I feel like the good answer is not necessarily the technically complicated one.

Saying "hey Siri" is fine if I'm in bed or in the shower, I don't need quick access to a shell in those places necessarily. That's fine to have as a backup. But for normal operation, if I'm wearing a smartwatch, it will pretty much always be more convenient and faster for me to tap and hold on that watchface than it will be for me to say "hey Siri".

I mean, that's a boring answer, but there's also a reason why my computers have buttons. I wouldn't want to use my phone because that's in another room or in my pocket. But a watch will always be reachable in less than a second, and the modern watches are waterproof, and I don't need to look at anything to use it -- I can just tap my watchface and start talking. And if my hands are dirty, or I'm carrying groceries, or I'm in bed, falling back to "hey Siri" isn't the end of the world in those scenarios.

In practice, when I see people interact with voice assistants today, they stop what they're doing, they give the command, they listen for a confirmation, and then they start what they're doing again. The biggest bottleneck there for their speed is precision -- they intuitively know that they need to stop what they're doing and optimize for the device. The precision, and the delays that are built into the UX to confirm what's happening -- that's the bottleneck. So if there's an operating mode that is just as fast and way more precise, we should just do that, we don't need to use voice triggers 100% of the time.

Bonus points if we're wasting processing time for a voice assistant to make a round trip and process the audio clip to try and figure out who's speaking. The person who pressed their watch is speaking, boom, we can get rid of that response delay now. How much time are we wasting trying to come up with wake words that optimize for both speed and precision -- when using wake words only as a fallback would allow us to make them more precise because they could be longer, more deliberate phrases?