Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

First, I’m really excited people are trying new things, but I won’t be buying this just based on the demo.

> The conversational interface has never worked before for many reasons, but that does not mean it cannot work in principle, …. I'm glad they're trying. Also, the laser display is neat!

So I did a lot of work over the years to research voice UI/UX and I’m very skeptical about this, even with the LLM stuff. I think an LLM was missing from the Siri/alexa era to transform it from “audio cli” to “chat interface” but there’s a few reasons besides that it didn’t catch on.

The information density and linearity of chat, voice especially, is a big problem.

When you look at a screen, your eyes can move in 2 dimensions. You can have sidebars, you can have text fields organized in paragraphs and buttons and bars etc. Not so with chatting - when you add linearity (you can only listen to or read one thing at a time, conversation can only present one list at a time) it becomes really slow to navigate any sort of decision or menu trees. Mobile-first have simplified this of course, but it’s not enough. Reading TTS becomes even slower to find the info you care about. It’s found a place for simple controls (smarthome, media, timers, etc) and simple information retrieval (weather, announce doorbell, read last text). Then there’s the obvious problem of talking out loud in public, false response recognition etc which are necessary evils of a voice UI.

I think the best hope for a voice device like this is to (as they’ve done) focus on simple experiences like “what’s I miss recently” and hope an AI can do a good enough job.

The laser display might help with presenting a full menu at once (media controls being an easy example), but it probably will end up being a pain to use (eg like a worse smartwatch).

Honestly though, my biggest hesitation (which could end up great) is the “pin” design. It’s novel, especially with the projector, but how heavy is it and how will that impact the comfort of my clothes? What about when wearing a jacket or scarf? Will this flop around while walking? Etc.



There is also a lack of serendipity or explorability with voice: How do you know whats possible? There is a reason a GUI menu is called a menu. It not only gives you access to multiple options but also at a glance an overview what options are there, like a restaurant menu.


Discoverability is the term; e.g., "What Can I Say? Effects of Discoverability in VUIs on Task Performance and User Experience" https://dl.acm.org/doi/10.1145/3405755.3406119


It'll flop everywhere, not just while walking. Boom boom.

But yeah I've been thinking that too. "Oh, put my coat on - better spend 30 seconds messing around with my pin" [...] "Ahhh back in the office. There goes another thirty seconds moving the pin so it can film me looking at a screen for four hours"

And yeah, I feel like the weight would definitely pull my jumper or t-shirt out of shape, and make things like my collar/neckline look out of whack. Maybe they'll bring out a range of clothes suitable for it, or suggest you wear a coat indoors like the woman in the video is doing.


Linear conversation is a big problem for anything beyond simple, casual usage. It is the reason that YouTube is a terrible research platform. Is the information you want inside that 3-hour video? Possibly, but with text I can search an article for content or skim sections to determine if it's worth a deeper read.

Let's not forget the value of non-linear input. Good search terms are often constructed rather than spilled forth. Sometimes I enter search terms, read it and realize that it's like to return unrelated results and need to modify it. By the time I realize this while speaking to an AI it's already spitting out the wrong information.

This leads to a need for altered interfaces that allow these scenarios to be accomodated. This is v1.0. Let's see where it goes.


>Will this flop around while walking?

If a science fiction author was writing it, the need for stiffer fabrics to support chest cameras would synergize with a neo-Victorianism in generation alpha. (Formal button-up shirts and higher necklines for enforced modesty)


IMO, with LLMs we won't really need information density except for certain classes of people.

Even now - clicking through some insurance company's website hierarchy to find something out is insanely painful.

But even for researching things that we should probably care about enough to do it ourselves, correlating different sources of information or working through abstract/ambiguous problems... the vast majority of ordinary people will 100% take the easy way out and let LLMs do most of the thinking for them. Even with free GPT-3, people are unflinchingly having LLMs solve problems they don't want to think about too deeply. What they pay for, with occasional inaccuracy, is more than offset by convenience.


> IMO, with LLMs we won't really need information density except for certain classes of people.

Maybe, but I don’t know if that day is here yet. I think “most people” do actually consume information. Like reading an insurance company’s website is pretty rare compared to things like using the Amazon App. Like it’d be hard to consume a list of 5+ push notifications via voice if you had to listen to them 1 by 1 instead of skimming them in a list next to their icons.

Even simple things like scrolling through a list of songs becomes painful. I have like 10k songs in my (streaming) library Sometimes I randomly scroll through it to find old music. That sounds impossible on voice. I’d be stuck with “shuffle” mode.

Being able to summarize and search text conversations via voice queries from their demo would be nice, but today that’s a task that you need a screen for.

The demo video shows the man buying a book online via voice after holding it up to the camera. How often is that the online shopping experience? I can’t imagine shopping without a screen 95% of the time.


>we won't really need information density

we may not need it but we certainly prefer it. People went completely voluntary from voice calling to texting and within texting to ever terser forms to the point were an entire website was built around a short character limit.

Except for people with disability I have not really seen a single case where that tendency towards compactness is reversed in communication.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: