The article overlooks one of the major reasons for wanting a non-voice interface to an audio experience: Being quiet.
Most of the time, if I'm wearing headphones, it's so as to not disturb others around me. Otherwise I'd play it out loud.
This benefit goes away when everyone around me suddenly hears me bark out loud to adjust the volume, or send a text, or what-have-you.
I'm a big believer in audio-as-a-platform (particularly the AR possibilities), but I hate audibly trying to speak to a computer. It's by far the worst input interface.
(On the other hand, much like cameras, the best interface is the one you have with you. I yell at my Google speakers all the damn time, because my hands are busy around the house. But those are also speakers, not headphones, and therefor a different use case.)
So far the only thing I find myself using voice commands for is to set alarms on my iPhone. Saying "Hey Siri, set an alarm for 15 minutes" is so much easier than doing it manually through the clock app.
Other than that though, I haven't found any practical use for them. Maybe once AI improves a lot, and it feels like you're talking to an actual person, it might be more useful.
Similarly the only use Siri gets on the kitchen HomePod is for hands-free timers while cooking.
In my eyes the primary thing that's standing in the way of voice assistants being useful isn't even as high of a bar as general AI, but just the ability to parse a command into multiple, potentially chained commands. Even with the inability to figure out things like context that would boost usability a lot — for example, it'd allow commands like, "set timers for 5 minutes, 10 minutes, and an hour" instead of having to make each request separately.
Google Home supports this nicely. On my Lenovo Smart Display, I just tried "Hey Google, set a ten second timer and a fifteen second timer" and she did just that.
They also support custom "routines" that you can program. We have a Clever Dripper [1] and use Tom's (of Sweet Maria's) recipe: stir after 90 seconds, drip after 4 minutes (2:30 after the stir). So I created a routine called "coffee timer" that triggers those two timers, and when we pour the hot water into the Clever Dripper we can just say "Hey Google, coffee timer" and it sets both of them.
Actually I put in four variations so we don't have to get the language perfect:
The first one to support a "conversation" even if with same level of intelligence and understanding today will be a MASSIVE improvement. If it would listen while talking so you can interrupt it. Would be so much more useful.
I feel like I want the opposite. For me, command lines work. They are precise, they are consistent, they are easy to reason about, and they are composable, which allows me to do very powerful things on the fly. There is a reason why I do not have a natural language interpreter wrapped around my Linux command line.
What a lot of these natural language systems do is they force me to intuit their interfaces instead of just looking them up. It's more complex, harder to learn, and much less powerful. What I want is very reliable, accurate voice detection with a well-designed, composable interface.
I want to treat Alexa like a computer, not like a person. People are not the most convenient interfaces to interact with, computers are better to interact with for me than people are.
I understand that different people are in different positions, some people want to have a conversation, but you're not going to make a voice assistant that's good for me if you follow that goal. At some point the ecosystem needs to fracture and diverge so that normal people can use whatever NLP interface Google/Amazon/Apple spits out of their AI opaque-boxes, and people like me can use a voice interface that is designed around well-tested decades-old computer UX principles that have been proven to work well for power users.
My vision of a voice-operated utopia isn't treating Siri like a person, it's on-the-fly composing a complicated new task by voice that is saved for later use. It's using timers as a trigger for other tasks with some kind of pipe command so I can tell Siri to send an email after ten minutes, or so I can have Siri look up a search and seamlessly pipe the result into a some kind of digital notebook.
My issue is I want to open a "voice shell". I want the same obvious commands but with more rapid responses and without needing to say "Hey Siri, X" every time. There should be a natural way to enter a mode where the back and forth is still not natural language but voice.
This is another area where I feel like the good answer is not necessarily the technically complicated one.
Saying "hey Siri" is fine if I'm in bed or in the shower, I don't need quick access to a shell in those places necessarily. That's fine to have as a backup. But for normal operation, if I'm wearing a smartwatch, it will pretty much always be more convenient and faster for me to tap and hold on that watchface than it will be for me to say "hey Siri".
I mean, that's a boring answer, but there's also a reason why my computers have buttons. I wouldn't want to use my phone because that's in another room or in my pocket. But a watch will always be reachable in less than a second, and the modern watches are waterproof, and I don't need to look at anything to use it -- I can just tap my watchface and start talking. And if my hands are dirty, or I'm carrying groceries, or I'm in bed, falling back to "hey Siri" isn't the end of the world in those scenarios.
In practice, when I see people interact with voice assistants today, they stop what they're doing, they give the command, they listen for a confirmation, and then they start what they're doing again. The biggest bottleneck there for their speed is precision -- they intuitively know that they need to stop what they're doing and optimize for the device. The precision, and the delays that are built into the UX to confirm what's happening -- that's the bottleneck. So if there's an operating mode that is just as fast and way more precise, we should just do that, we don't need to use voice triggers 100% of the time.
Bonus points if we're wasting processing time for a voice assistant to make a round trip and process the audio clip to try and figure out who's speaking. The person who pressed their watch is speaking, boom, we can get rid of that response delay now. How much time are we wasting trying to come up with wake words that optimize for both speed and precision -- when using wake words only as a fallback would allow us to make them more precise because they could be longer, more deliberate phrases?
Alexa will shut up immediately if you say “Alexa shut up” or “Alexa SILENCE” when she’s talking. We got an echo dot which came with our Alexa enabled microwave, and now that I have it set up to stream Apple Music, we use it all the time. Our thermostat also has it although when we change the temperature she’ll sometimes inexplicably change the house temp to 60°F which we won’t realize until 3am when we’re freezing in bed.
Well it’s also an oven, I often say “Alexa preheat the oven to 400” or “Alexa air fry for 10 minutes”. For soups instead of having them explode I say “Alexa microwave at power level 3 for 9 minutes” which avoids the exploding soup problem. Of course there’s also “bake at 400 for 30 minutes”.
I also have a five year old who is pretty competent at talking to Alexa even though she can’t intuit what the button controls mean on the microwave oven.
As someone that has last used microwaves on a regular basis when they just had two dials, I could see the use for Alexa to figure out how the more more modern ones with like 40 buttons work.
The first time I tried the office kitchen microwave, I had to ask someone from accounting standing around, how to heat my cup, because that stupid thing just responded with a condescending beep to whatever I pressed :/
Was this a microwave where "hit digits to type in a time, then hit start" doesn't work? If yes then yikes. If no then you only have to learn it once and trying to use alexa sounds much more complicated.
Chaining works on Google Home (and, I believe, Alexa?). I use it all the time, e.g., "Cancel the 5 minute timer and set a timer for 6 minutes" or your example of setting multiple timers.
I've got one for "Siri, good night" that turns off the lights, sets the phone to DND, turns down the brightness and starts Sleep Cycle in the correct mode depending on the day (alarm on/off).
I have a HomePod in the kitchen and use Siri for two main things :
1. Hey Siri, add Olive Oil to the grocery list. Super easy. When I’m cooking and running low on something, no need to break my stride and pull out my phone, or clean my hands. It’s immediately into my grocery list and out of my mental inbox.
2. Hey Siri, play some Jazz. Or whatever music. This is nice and easy to get some music on the HomePod for either cooking, working, or dinner background. The only annoyance is that Siri can be super particular at times unlike when searching Apple Music on my phone. Also sometimes my kids hijack my music selection with their own, hehe.
These types of things are my favorite use cases. I set up a little test with a beacon inside my bedroom and now it automatically turns on the lights for me when I walk in (if I'm wearing my Apple Watch) and then turns them off when I leave. It's not perfect but it's very convenient.
I also have a few automations that are set up for when I leave the house to make sure music turns off and my thermostat is set to "Away" mode.
Yea, I have during the winter months a "when arriving home after sundown turn the lamp on" automation as well. During summer months I turn that one off.
I've generally really enjoyed basic smart home stuff. I have a handful of smart wall switches that work well, and for the lamp a wall-wart that I use. One of these days I'll just retrofit some actual lighting in the living room and can skip the lamp situation but until then... this is a nice workaround.
My 10 year old discovered "Hey Siri turn off the TV" works... and it's amazing. Can use the light of the TV to walk upstairs then turn it off at the top.
Apple has a list of commands somewhere - even if it’s just the source code. That I cannot find that list and have to try and guess the magic words pisses me off to no end.
I live in a country where a good six months I'm wearing a warm hat and gloves and my phone is buried deep in my clothes to keep the battery over the freezing point.
It's a lot easier to say "Siri, play podcasts" (Which triggers a Shortcut to start playing a specific playlist in Overcast) rather than opening my jacket, digging out the phone, taking off my gloves and fumbling with it to get the podcasts running.
Especially the shopping list. I most often notice that I need something when I'm in the middle of using the last of it, which means my hands are full and often dirty or wet. But I also don't want to put it off because it's something I do need to do.
I'm also a fan of "hey siri, turn off the tv" when leaving.
I have huge gripes with "what's the weather". Siri will for sure say the temperature, which means nothing in a city where the windchill is often ten degrees lower.
If you have a HomePod and an Apple Watch, you can have Siri turn off all media devices when you leave the house. If you have a CRC receiver for your TV hooked up to an Apple TV, you can have it power down the whole entertainment system when you leave.
Fun fact - make a scene named whatever you want - say “rude word the rude word” and then you can tell Siri that and the scene will be enabled! No more “that’s not nice” replies.
I find myself setting alarms too.. the problem is I'm trying to set timers haha
"timer 1 hour 30" is parsed into "Make an alarm at 01:30 named Timer."
Every time I do laundry I get woken up at stupid o'clock the next morning haha.
Edit: Sorry to those below for the confusion I seeded. To clarify I mean the 1 hour 30 is the bit that doesn't work. If I add minutes to the end of it it works perfectly fine.
But you are changing your behavior. If you told a human “timer 1 hour 30” they’d look at you very strangely. My suggestion is to stop using special phrasings for voice assistants.
Oh sorry, I thought your complaint was about having to phrase things as sentences. Yeah, Siri is pretty stupid about inferring that the number after hours is minutes.
Parsing time related stuff in general seems to be an issue. "Set a timer for a minute fifteen" makes an alarm named "Timer" set for 3 PM.
But if you say "Set a timer for a minute fifteen seconds" it works fine.
Curious if anyone else can duplicate the "a minute fifteen == 3 o'clock" issue or if it's somehow hearing me wrong.
I used your exact wording and it did exactly what you said: created an alarm named "Timer" set for 3PM. I also did it with Siri's language set to "English (United Kingdom)" and "English (Ireland)" and it still did the same thing, so this idiosyncrasy appears to be independent of which language Siri is set to (at least within the set of varieties of English it supports).
Also replicated in English (Australia), so I agree. I would have phrased this as "one and a quarter minutes", which works, or "75 seconds" which also works just fine. for the 1 hour 30 request above, I would usually phrase this as "an hour and a half" which works just fine. I agree siri can be very picky though - things need to be phrased a certain way, and adapting speech patterns to that is frustrating.
Other simple time-based requests such as "what's the time difference to singapore?" don't work on siri either, which is irritating as I work across multiple time zones and I'm forever figuring out time differences.
This is hilarious considering the etymology of the term "computer". The first computers were people who performed computations. Computers as you're using it, is a Digital Computer.
1. Timers. Super useful when cooking. Also my kids can use it as they are doing remote learning and need to know when to get back to their video calls.
2. Playing videos while cooking. Sometimes I enjoy watching a sitcom while cooking on my Echo Show. Or I might want a cooking video though that’s more rare. My roommate uses it all the time for questions like “how long do you bake salmon?” With mixed results.
3. Controlling lights. “Alexa give me a light” as I walk into a room is way easier than turning on three separate switches in separate parts of the room. Turning them off is equally nice.
4. “Alexa tell me a kids joke” is a frequent thing we use.
5. Answers to random questions while at the dinner table. “Alexa who is the prime minister of New Zealand?” That type of stuff. It feels more natural than whipping out our phones.
6. We tried using Echo Show’s drop in feature but it is just too intrusive as compared to something like FaceTime. The other side doesn’t have to pick up the call. You just are in their house, camera on and all.
7. My kids really like when we play Harry Potter quiz. It’s a silly app but it is somewhat entertaining.
8. Really funny “routines” (their word for scripts). “Alexa, set condition two throughout the ship” to turn off all the lights. “Alexa, release the kraken” to set off the Roomba, etc.
9. My kids listen to podcasts all the time.
10. My kids use it to help them spell difficult words.
What I wish I could do a bit more with it is integrate it with things like random status dashboards. I combined a power metering AC plug with my washing machine and Home Assistant to know when it’s done running. Would be nice to be able to say “Alexa notify [roommate’s name] when the washer is done.”
Overall I think something much simpler that does processing locally could replace it for me but so far these things are cheap enough (echo dot) to put in every room.
I was the same way until I got a HomePod. After a little bit of trial and error, I use Siri a lot for automations, timers, shortcuts, spelling assistance, and a whole slew of other things. That has now translated back to my phone so it's nice that the interface across all my devices is the same for this.
Weirdly, HomePods do support multiple (optionally named) timers. Since they're also based on the general iOS family, there's a bit of hope for the feature eventually making its way to the phones. I'd assume it's held up on them having to redo the UI, if anything.
The spatial audio feature makes me think that Apple has some level of head movement tracked with airpods.
I'm imagining a system where you can just use nods/head shakes to move through some sort of binary decision tree to execute some basic interactions, like reacting to incoming alerts/messages.
new text, music pauses, siri reads it to you: "Your mother asks if you'll be home by dinner, Would you like to respond?"
shake head no -> interaction cancelled, music resumes
nod head yes -> "Ok, how would you like me to respond? Yes, you will, or no, you won't?"
gesture head for appropriate response
Easy? Dumb/ridiculous? Sure, you can't get suuuuper deep with the decision tree and it's tough for non-binary responses, but it's enough of an interface to have a meaningful, non-verbal engagement with a computer.
The platform is AirPods, Apple Watch, and the yet to be announced AR product.
Just like with the M1, if you squint, Apple is testing and iterating in the open. The spatial audio is a good example but so is the Watch’s auto-detected hand washing countdown.
There are public sprinkles of this coming platform elsewhere, such as in Apple fitness workout HUD Rings widget. The proximity-based handoff is another.
The author is correct that Siri is not a good platform but for reasons they do not identify.
Voice based interaction model is weak from a UX perspective. But for Apple it is even weaker because the company is unable to use any of the unique advantages it holds over the competitors.
For example, apple’s array of services, control over the technology stack, reliable and secure intra-device communication, the App Store, the iPhone as a unified configurator, access point, update manager and biometric authenticator.
You can’t pull that stuff out of a hat.
Siri sucks. I have a few HomePods, use plenty of homekit and try to get the most out of it. But it is bad at almost everything it sets out to do.
Siri is clearly not the focus for the company, and if anything it sent competitors scrambling to own a space Apple doesn’t even want.
The interaction model includes physical hardware, like the big crown on the new APMs, but I suspect it is likely going to be based largely on eye movement. Something not too twitchy.
The enormous amount of sensor data from watch and AirPods are like the gps and gyroscope of iPhone. Apps can require either or none.
So I think the author is right that AirPods are important but they are not the center. They are a component of the next platform.
I agree, but don't throw the baby out. One of the most useful uses of voice I've found is using a Fire stick with Alexa. Just press the mic button on the remote and say a actor/movie/tv series and it presents everything it finds from all your subscriptions. It's rare it doesn't understand and as a Glaswegian that's impressive.
This is true for me, on Roku. Considering the context, that's far from a platform, it's not even an interface. It's just an alternative input to the physical tv remote interface. If I were to simply re-task a tablet or old phone as my TV remote (easy to do with Roku), I'd stop using voice, and get much more out of the second screen.
> Most of the time, if I'm wearing headphones, it's so as to not disturb others around me. Otherwise I'd play it out loud.
Maybe that's the key to its network effect as a platform. If you don't want to hear everyone talking to themselves in the library, you'll need noise canceling headphones too.
In science fiction, the step before mind-machine interface is sub-vocalization. What would it take to allow an interface device to be able to hear you but others not?
When golfing, I want to keep track of how far I hit the ball, the club I used, and where I landed. There are apps for this, of course, but I can't use them. By the time I've arrived at my ball, I'm not going to stop, take out my phone, and start fidling with UI controls to select a club or confirm a location.
I would love to have an app that let me keep one AirPod in my ear, and allow me to track my golf game. The UX would be something like this:
1. Arrive at course, and use phone to select the tees and confirm the course I'm playing. Start the round.
2. Tap my AirPod and say, "Teeing off on hole 1 using driver"
3. Hit the ball
4. When I arrive at my ball, tap again and say, "hitting seven iron"
5. When I sink a putt, tap and say, "next hole".
From just those interactions, the app could keep track of every shot, and also keep my score and number of putts. I could choose to not announce each and every shot if I wanted to, and instead say, "add three strokes" once I'm done with the hole.
I could also ask, "How far to the middle of the green?" and get a distance in my ear. "What did I hit last time on this hole, for this shot?" (Answer: "You used a nine iron, and hit it 107 yards")
All that would be killer for me. Nicer than staring at my phone screen in the sunlight, and looking like I'm farting around to the players waiting for me to clear the fairway.
I don't think there is any Siri integration yet but disc golf has an app with some of that functionality in the uDisc app. You can "map" a round and it can track all of your throws. I think you can input disc (club) selection as well? No AirPods integration like that but it does have Apple watch integration I think.
Has anyone actually TRIED using the AirPods in some form programmatically? It's impossible, you lock down your phone TouchID/FaceID and attempt to do a simple list by voice and "you need to unlock your Phone first" ... there is no trust in the pairing of hardware. I wanted to do a delivery platform based ONLY on the AirPods for all parties (pickup/dropoff/billing) but it's just not possible. I hope it changes in the future to reflect some of what is in this post.
Given that airpods already know when they get removed form your ears, I wouldn't be surprised for the next iteration to have a form of "unlocking" once you unlock your phone or watch while connected to the airpods. There's no reason for you to not be able to do tasks as long as you don't remove the airpods.
This starts to touch on one of the reason every "personal assistant" is crap: they're basically just a verbal command line.
Their only real value proposition is basic interactions when the user's hands and/or eyes are busy with something else.
Until they can get smart or fluent enough that they can rival the effectiveness and accuracy of hands-on-screen interaction, the use cases will remain fairly niche.
More programmable, multi-modal interaction is a step in the right direction, but it'll require a lot more.
Airpods (or any modern earphones) are absolutely the first really useful AR product available to consumers.
Transparency mode is a critical success even if it’s so boring as to be unremarked upon by most people. An AR device is most useful if it’s ubiquitously available. Transparency mode makes that possible (even if it could use improvement). The device also needs to avoid a negative social stigma. Airpods have largely achieved that as well (at least along younger people). That is partially marketing/brand image - but it is also based on utility. Older folks would generally find it rude to leave headphones in while having a conversation because they assume the listener isn’t listening, but I work with a lot of teens, and it seems like they couldn’t care less. It’s understood that the speaker can still be heard. A quick tap/squeeze is the social signal.
The author is right that there is huge untapped potential in auditory augmentations, but the focus on verbal input control is misplaced. It’s simply too obtrusive for public environments.
If I were betting, I’d say Apple won’t open up this kind of functionally until the (cross device) input control is generally codified, and that scheme will be intrinsically linked to a forward facing camera/sensor package to provide contextual awareness and implied user attention & intentions (i.e. glasses or similar).
I don't get what this has to do with AirPods exactly compared to other Bluetooth headphones. Besides the author using a lot of Apple words it seems like they're proposing a new product that has nothing to do with Apple.
"The most obvious choice here is Siri, which is already integrated into every pair of AirPods."
- Not entirely true. It's the pairing of the device with an iOS or capable watchOS device.
"Why has no one thought about additional buttons or click mechanisms that allow users to interact with the actual content?"
- It's called a smartwatch (The pebble was really nice at this). or generically bluetooth radio controls.
I wish these design analyses talked about the material input costs needed to produce the thing we might perceive as a 'platform'. I just see more batteries, wear, expense, etc.
I agree with the author about the potential of audio interfaces + some simple additional inputs + integration with certain apps.
For me personally there is a suite of tools involving audio books and note taking that would change my life: A remote with a few physical buttons to rewind, switch to record-mode, skip sections. Speech to text with full text search. Voice recordings tied to what I’m listening to. Basically, I want to be able to work through a difficult audiobook while walking around.
Not that patents are predictive of what will actually make it into the product, but Apple does have at least a few covering hand gesture and other inputs to AirPods:
It could end up being quite hilarious if Apple ends up using various "signals" like biting your teeth, making grimmaces and weird noises to control invoke commands.
You'd no longer be sure if someone is having a seizure or trying to stop the podcast she is listening to.
It makes sense considering Apple is all about diminishing physical technology (iMac: the screen is the computer). AirPods, Watch or something else as tiny or completely invisible will be the next platform, once they solve the performance and UI problem, which they will.
> Apple is all about diminishing physical technology
That's an interesting perspective. Theoretically, the best way to get rid of hardware is to move as much as possible to "the cloud"[1], yet Apple isn't very good at cloud. (At least, not as good as Google and Amazon.)
So let's say we're headed to a future where the only physical electronics anyone owns are wearables: watch, glasses, ear buds. No phones, tablets, laptops, or desktops. Just wearables.
In that scenario, who wins? Apple is best suited for making that hardware (by a long mile), but Google and/or Amazon are better suited for handling the software in the cloud.
I'd place my bets on Apple catching up on cloud faster than Google or Amazon catching up on hardware.
However, if we took it a step further and went full Mana[2], tapping right into the nervous system, my bet would be on Google winning that one. They have the cloud capabilities and expertise, but Alphabet also has some experience in health and biology (if I'm not mistaken).
--
[1] I know, I know. "Cloud" is just someone else's computer. It's also more than that.
I think apple is very good at cloud. They don't sell cloud services so comparisons to GCP or AWS is unfair, but their cloud integrations are pretty top notch from my perspective. My phone backs up automatically. My photos are available on all devices with the swipe of a single slider. iCloud is so tightly integrated with their products that a lot of people don't even know they are using it. I think that's a pretty good implementation of cloud.
I would argue that Apple has fewer cloud products, but most of their cloud products are very good. Amazon and Google have many cloud products of varying degrees of quality.
> Apple is all about diminishing physical technology
That's an interesting perspective
It's Steve Jobs' perspective. He talked repeatedly about technology disappearing into the background, and one day we would have technology so good that we wouldn't even see it. It would disappear into the walls.
To me, it's the ultimate expression of making computers work for us, not the other way around, which is mostly what we have now.
> Theoretically, the best way to get rid of hardware is to move as much as possible to "the cloud"[1]
An alternative view is that Apple’s biggest product puts enough compute in your pocket to make “cloud” unnecessary in a lot of cases. My phone is somewhere between a t3.medium and a t3.2xlarge (based on ram and cpu cores respectively). That can provide a lot of local compute for my wearables. And those wearables are gonna need network anyway, so either that all end up with 5G cellular radios (and 4/3G fallback) and enough battery to run that, or one device provides the network hub and the tiny things in your ears and the glasses sitting on your nose can have lower power radios and smaller batteries.
I reckon watches/glasses/earbuds(/cars/tvs/etc) all relying on a phone is a reasonably sensible model, rather than each of those devices having completely stand alone capabilities.
(And, the idea of Google tapping into my nervous system??? No thanks... I’ll proudly be a data center smashing neo-Luddite before that happens to me...)
I really enjoy the data addons for IPad and Apple Watch because they allow me to be connected without having a phone in my pocket. What I desire is connectivity (be able to see messages, make calls, etc), but iPhone often encourages disconnection via mindless scrolling. I enjoy the times I can get away from my iPhone, and I am not excited about this hub future you describe.
I actually think Apple is about integrating different hardware to create unique interface experiences which are cross-device. I suspect it's not going to be 'one technology' any time soon, it's going to be lots of devices working together seamlessly.
For example, if I am playing music on my Airpods from my iPhone and my phone is in my pocket, turning the crown on my apple watch is a really neat and intuitive way to change the volume. The first time I did it and it worked it felt like magic.
Similarly walking down a street and getting audio directions on AirPods almost works - but if that's combined with a small map on my watch it works much better - better than a phone.
But at the same time, neither a watch or Airpods are going to be the right way to send a private text message on a quiet bus, and because a giant new unifying technology isn't with us yet, I suspect a hybrid approach is going to be with us for a while.
As far as I am aware, no one has been able to make a successful Voice First product. Some products, like Spotify or Audible, are greatly enchanted with Voice, but voice is just another access point.
There's a successful Windows app called VoiceAttack that will let you use voice instead of controls ingames.
They also offer voice packs spoken by famous actors that are popular. Having Kirk's voice update you on the status of your space ship is enticing.
If you're taking the piss I don't know where I've tripped up from your comment. Is there something infeasible about triple taps? If you jailbreak you can do it..
No I'm serious, wasn't reflecting on your comment. Im not a doctor so not sure if air pressure in the ear canal could be detected and how much a human can control it. But it would be easier than voice or touch from the users point of view.
Eh I think that interface stinks. It's just physically very uncomfortable to tap on the headphones in your ear because of the loud noise and the headphone pressing deeper into your ear. I have a knockoff bluetooth "Pods" and I absolutely hate the triple tap.
> The input mechanism I describe doesn’t have to be a physical button. In fact, gesture-based inputs might be even more convenient. If AirPods had built-in accelerometers, users could interact with audio content by nodding or shaking their heads. Radar-based sensors like Google’s Motion Sense could also create an interesting new interaction language for audio content.
> You could also think about the Apple Watch as the main input device. In contrast to the AirPods, Apple opened the Watch for developers from the start, but it hasn’t really seen much success as a platform. Perhaps a combination of Watch and AirPods has a better chance of creating an ecosystem with its own unique applications?
All AirPods actually do contain an accelerometer, and the Pro contain a gyroscope as well, however I'm not aware of them being opened to developers, so they're still not of use to anything unless Apple decides to open them up or implement the feature themselves.
Apple has them in the pencil too and they kind of suck. For the AirPods it’s not too bad for the double tap action but on the pencil it’s horrible. It’s such a difficult action to make when using it and I trigger it randomly all the time. Would much prefer a Wacom style button bar.
Most of the time, if I'm wearing headphones, it's so as to not disturb others around me. Otherwise I'd play it out loud.
This benefit goes away when everyone around me suddenly hears me bark out loud to adjust the volume, or send a text, or what-have-you.
I'm a big believer in audio-as-a-platform (particularly the AR possibilities), but I hate audibly trying to speak to a computer. It's by far the worst input interface.
(On the other hand, much like cameras, the best interface is the one you have with you. I yell at my Google speakers all the damn time, because my hands are busy around the house. But those are also speakers, not headphones, and therefor a different use case.)