Curious how it compares with Mycroft (https://mycroft.ai/); I've tried it last year wanting to get rid of Alexa in my home but the voice recognition was terrible (even with a specialized 4-mic array and running it on a PC instead of a Raspberry Pi).
I'm still looking for a privacy / offline voice assistant solution, so I'll dig a bit on this.
Mycroft got a recent update (maybe 3 months ago) and seems to work much better than before. I've got a Google AIY Voice Kit v1 running on a Pi 3 B+ and it works quite well to about 6ft away. False wake's prior to this latest update was 4 or 5 times a night watching Netflix. Now down to once a night or not at all on a good night. Slow-going but they're progressing.
Seems strange that this requires java. Feels like they could get better performance from a compiled language and it would run on lower profile computers.
But, still, an exciting project. I would love to rid my house of Google Home devices which are training my kids to think Google is god.
It's a tradeoff between agonizing over the bickering of my kids or agonizing over the music they play which stops the bickering. Google Home makes the latter choice simpler than handing them my phone. I'm aware I'm fully responsible for my own choices and I can still feel regret for where I'm at given those choices.
Could have a cheap/old tablet/phone that lives on the coffee table or whatever for them to select music, with app pinning perhaps, not even locked if nothing important is logged in, instead of handing them yours.
> It's a tradeoff between agonizing over the bickering of my kids or agonizing over the music they play which stops the bickering.
My parents would simply scream with a loud bang from a foot stomp or fist slam "Enough!" OR "shut up, the both of you!" which worked very well. Did not require purchasing anything, accounts or internet connection.
> Seems strange that this requires java. Feels like they could get better performance from a compiled language and it would run on lower profile computers.
I feel like the meme that Java is slow and bloated needs to die. Java can be very fast, and also runs on fricking SIM cards that have extremely limited processor and memory.
SIM Cards running Java run an extremely cut down version of Java that removes basically all features that go beyond what C can do with some sprinkles on top. The biggest issue with large Java applications is real and it usually relates to either GC churn or inefficiencies. Source: Running several large java applications as professional sysadmin.
Granted, often enough Java is fine, it doesn't use too much overhead, especially once the JVM is warmed up. But the bloat is real and it's partially related to the language itself but plenty of it is also systemic.
A lot of deployed deep learning is in Python so I doubt this will matter. Looking at the docs, it looks like the main speech to text is python heavy, it leverages elasticsearch and calls libraries that are bindings to code written in C++. As none of that overhead will ultimately matter for anything approaching the fidelity you get with cloud services, cost benefit trade off favors Java in this scenario.
> lower profile computers
When it comes to deep learning, this just isn't the direction things are headed. If you want the best results with low latency, your home server is going to have to be fairly beefy with a decent GPU and lots of memory.
Honest question, as i've never used Java.. why do i need to install various Java runtimes then?
As someone who never uses Java _and_ still dislikes it, it's because i've had to install that damn runtime a thousand times. At least Python has the simplicity in coming pre-installed on many systems. I can't count the number of times i've installed/uninstalled Java.
Also, back in the day i was used to seeing some Java systray icon. I imagine that built some dislike towards Java too.
You need runtime libraries for all compiled languages, including C.
At any rate, Java isn't a client language. It's very much geared toward server development. It's not impossible to write client software with it (see Minecraft), but it's really clunky and not really suitable. They've basically done everything they can to make it as awful as possible to use for desktop users, basically ceding the ground to C# almost more through self-sabotage than losing a fight.
I don't want to be conspiratorial, but it can be suspected Microsoft and Oracle had lunch together and came to some sort of an agreement.
This often takes a long time for people to understand. Java applications are not "native".
Think of it in terms of human languages.
C and C++ (and many - but not all - compiled languages) get compiled into Native program, so they get compiled into instructions that talk the computer's language, and when they get run they talk directly to your computer... albeit with a thick accent. So a linux machine can only talk to your machine via the linux kernel, a windows machine talks with a windows accent, macs... you get the idea.
Because of this thick accent, programs cannot be shared between different operating systems (though there are some amazing projects out there that either do some accent translation, or more recently, do clever things to avoid any accent at all. There are almost always limitations to what can be done here though.)
That's not to say that sometimes, they don't need help. These "helpers" (or Shared Libraries - aka DLLs or .so files) might be absolutely necessary for a certain program to run, because the developers don't want to reinvent the wheel. So as someone else pointed out, even with native programs you sometimes need to download extra bits to get them running. But as you've noted, this is a lot more rare, and it doesn't usually change HOW you run the program, merely what needs to be installed first.
PHP, Ruby, Bash etc are Interpreted - so they need a special program that does the talking to the machine (again, via the kernel). This means they don't need to be compiled, which makes them quicker for developers to write, and easier to run on windows / linux / mac using the same codebase. But like with a human interpreter, it generally takes at least 2x as much energy to get the message across, which usually means a sometimes significant slowdown.
Note that this is not the same as having a Shared Library - and in fact these programs sometimes do use Shared Libraries directly. With interpreted languages, the interpreter starts first, it gathers together some core resources and ideas that it will need to be able to do its interpretation, and then it will load the human-readable program. Some shared libraries will be loaded by the interpreter, and some will be loaded later once the interpreter has seen the code.
Java is in a special category. It is both interpreted and compiled - that is to say, it compiles to something that looks a bit like a native program, but it still has this interpretive layer that gathers together core resources and ideas that the compiled program can then use - we call this part of the interpreter a VM, or Virtual Machine. It then loads a program that is very much not human-readable into that VM, which then does all the translating.
The theory is, Java could be the best of both worlds. It can have speed closer to a compiled language (and in fact, can sometimes be a little faster in places, as the VM contains some nifty tricks to move things around on certain machines) and yet the compiled languages can be run on any machine that has the Java Runtime installed (I believe they coined the phrase "Write once, run everywhere" which very soon got corrupted to "Write once, test everywhere" because the first few versions of the Java VM were spectacularly buggy)
As for that dislike of Java, I will point out that I spent 5 years writing Java programs. Once I was no longer forced to write Java programs, I stopped, and haven't touched it for at least 10 years. Because while Java could be the best of both worlds, it wasn't then. And it might be much better now... maybe. I cannot forgive it for the years of my life that it stole; and am now very happy in the arms of pretty much any other language.
The JVM is still operationally a nightmare. Without knowledgeable tuning it requires a lot of memory even for the smallest of apps which is the most expensive resource right now in computing and vastly constrains deployment.
That's not even getting into the Java ecosystem which is vast, but it's full of nightmares like remote debugging ports that allow unauthenticated remote access to the JVM's memory space.
Not while I am driving and then by using my phone obviously.
Lots of these tasks are completely automatic (e.g. locking doors, announcements, turn things off when I am gone, motion sensors....) or simple inputs, e.g. volume buttons or a slider for oven temp.
> The little speakers can be put into every room, almost invisible, and for a fraction of the cost of a smart display.
My phone is in my pocket and thus in every room I am in.
So let me rephrase the question: If I use my phone, for which things is voice input so much better than just clicking on widgets?
Thus far I think hands-free while cooking and the UX for music/video are the standard examples. The question above was: Besides this, what are great use cases?
> My phone is in my pocket and thus in every room I am in.
Not everyone does this. Eg. My parents use Alexa but keep their cell phone by the door or on their nightstand.
> So let me rephrase the question: If I use my phone, for which things is voice input so much better than just clicking on widgets?
You already answered one: Cooking, etc. But also some people have other household members. The communal nature of the device matches the communal nature of a household. If I hear an announcement saying to put laundry in drier, I can act on it even if I didn't start laundry. Or if there's an timer to take something out of oven, I can act even if the cook is in the bathroom.
Voice needs no physical mode-switching. Ask and receive regardless of whether you've got your hands full doing something else, or looking at something else, or don't have your phone super-glued to your person.
There's also no tangential distraction when looking up information. If I'm going to travel I can ask for the forecast (daily or weekly) at places along the way without getting bogged down with unrelated web browsing or advertising.
Voice has the advantage that you don't need line of sight. If I think of the range of locations where a voice assistant can hear and respond, it's much larger than what could be easily covered by displays.
They're genuinely pretty great when your hands are busy, or you want to do something relatively far away.
95% of my usage is in the kitchen, and I think both amazon and google are hugely under-valuing that space.
My ideal voice assistant would be something like the Fire TV cube (streaming shows along with voice control) where the thing is able to show timers/recipes/unit-conversions picture-in-picture while the show is playing, as I ask.
Frankly - Amazon acted like a chicken with its head cut off with the Fire TV cube, it should be exactly what I wanted, but the device is almost laughably bad at basically everything it should excel at, and I still hate them a little for it.
I'm an Apple device user (iphone, homepod, carplay, etc, but no mac computers) so I use Siri. I will admit that Google Assistant is superior in a lot of ways, but I am no longer in that ecosystem.
- Reminders (managing, etc)
- Navigation (mostly in the car, but Siri kinda sucks at this)
- Music
- Lists (grocery, etc)
- Speech to text entry/messaging if typing is inconvenient (walking dogs, etc)
- Quick math/unit conversion
- Quick searches
- Seeing what entertaining things it can do
- Managing apple health stuff
I do have some home automation stuff (lights, thermostat, etc, but no locks), but Siri is not configured to manage any of it as of now.
I have one in my kitchen that I use exclusively for setting timers when cooking. It's the only one in the house and is always set to microphone off if not in active use.
If that's your only use, I'm curious why you don't just get a simple timer? I'm assuming because of the ability to set timers hands free when your hands might be dirty or otherwise occupied.
Long delay, but as for your first point I was thinking of a simple mechanical timer such as the one below. I think that could be set in around the same amount of time if not faster as long as it is kept within reasonable distance of where you stand while in the kitchen. Of course it doesn't solve the other problems.
It's nice to do basic computing (other comments already list the possibilities) with voice+ears instead of hands+eyes not only if hands are full/dirty, but also to mitigate fatigue/injury (if one uses conventional HCI quite a lot already and doesn't want more exposure), or to achieve some quick goal while in the middle of some longer phone/laptop activity without losing focus of the latter.
Turning lights on and off. "turn on basement" as I move towards the basement door. "Turn off basement" as I come up the stairs with my hands full of stuff preventing me from reaching the light switch.
For stuff like the lights in basement/hallway/bathroom/storage, and other locations where you always need lights (no windows, not bed/livingrooms), I just have lights with PIR-sensors...
Completely dumb, but turn on when you're there, turn off automatically a minute later. No privacy-implications, no need to yell to your assistant, and cheap/simple/effective.
I should replace every single switch with a smart one to do that. Which kind of switches do you have? If they are Wi-Fi how do you avoid hitting the max number of devices your home router is able to handle?
Either the switches, or, usually easier, the bulbs. As sibling said, Zigbee, but there is also Z-Wave (z-Wave is bigger in the USA, stricter. Zigbee, bigger in Europe, is more open, but you can get devices that suck in some way if you do no research). There is some new thing that’s related, Thread/Matter something, but I didn’t look into that yet.
ZigBee is the contemporary standard du jour for this stuff. Wifi is fine for some things but doesn't offer the same benefits (e.g. implicit mesh network of devices)
My cooking-hob has a timer per place (so 4) ... and then the oven/microwave has it's own timer as well. No need for some timer that doesn't automatically turn off the device it's timing.
These or variations of these are the most used commands at my house. Passive assistants can’t really figure out that I or my kids want to play a specific podcast or when to set a timer when cooking. Lighting is another tricky one. Sometimes I’m reading and want bright lights. Sometimes I want scenic lights. Sometimes I want very low lights while falling asleep.
I suspect most people don’t use their smart assistants for shopping. I do occasionally ask it a random question like “what year was Einstein born” when I don’t feel like whipping out my phone or play an occasional round of 20 questions with my kids. But that’s rare compared to the typical usage.
I have an egg timer I just turn the dial on. Replacing it with a billion transistors scratches a non-existent itch for me.
What I want is an assistant that will handle all my bookkeeping for me. Right now, I have to log in to the account, then download all the statements for the last tax year. Evidently, no bank has such an option. Nope. You gotta do it month by month, click, download, save, back, back, click, download, save, back, back, until you scream.
Then ya gotta download the transactions for the last year in a format that quicken can use. You can download transactions for last week, last month, last 12 months. No option for "last tax year". So you have to manually enter 1/1/2021, then 12/31/2021, then confirm, then download, then save, then do it all over again for the CVS version.
Then you gotta poke around looking for the tax documents.
Then I gotta get the transactions for a credit card I canceled 6 months ago. But since I canceled it, I can't download the transactions.
It must be only me that has these problems. Why is that? Does nobody else need to pay taxes?
No, I doan need no help setting a $%^&# egg timer :-) Maybe if I become bedridden I'll need help flipping a light switch. Until then, nope.
Depends on how you cook. I like to cool big complicated meals that take hours. At any given time I will have 3-4 timers going concurrently. I need to know which timer is going off. Also my hands are often covered in food as I am cooking and washing them to set a timers slows me down.
When walking into a dark room carrying a heavy box it is often easier to use a voice assistant to turn on lights. It’s also frankly more fun.
What you are looking for is not a home assistant. You want an accountant + QuickBooks. For $1000/year or so you can get both and be happy. Intelligence is often cheaper than artificial intelligence.
I'll wager you don't have kids. I'd have to spend half an hour a day traveling around the house turning off light switches countless times if it weren't for an easy "hey Google, turn off the kitchen lights" or "turn of all the lights".
We didn't have home automation when my grandparents where kids, when my parents where kids, when me and my brother where kids. We managed to turn off the lights. Have you considered to just teach your kids to turn off lights? It's not like flipping a light switch is an impossible task for a kid, they even managed to flip it when turning the light on!
My grandparents got along fine as kids with an outhouse and no computers at all. So I'm not sure what your point is.
For one, my grandparents, didn't have nearly the number of light switches even with a comparable house size. And "just teaching" kids to turn the lights off is quite laughable. My kids know how to turn off lights and know they should turn them off when they leave the room. But knowing and always remembering to do it are two different things. I also forget to turn them off sometimes. I remember my dad grumbling quite a bit about having to go around turning off the lights or having to run back in the house after you get outside and see the lighted windows. Just one less little hassle.
Okay, but your parents had just as many light switches growing up.
> But knowing and always remembering to do it are two different things. I also forget to turn them off sometimes.
What is "sometimes"? You wrote "I'd have to spend half an hour a day traveling around the house turning off light switches countless times" which is definetly not normal and a clear case of your parenting not working. I also "sometimes" forget a light. About once a year, tops. I don't see how this isn't your brains failing to manage a very basic every day task.
His kids have to run to the switch every thirty minutes after the lights went out on them? That sounds horrible. Like those toilets that leave you in the dark if you're not fast enough, only the timers don't even care if you move and also in the room they spent most of their time in. Abysmal.
It looks to me like the OP wants a personal assistant in a form factor. A flexible CRUD app with voice commands assisting you with finances, correspondence, legal affairs, scheduling, reminders.
It's much more difficult than home automation, but I think this is the direction we're heading into. The incentives are there, because access to such data is valuable.
Regarding the bookkeeping thing (this happens to be my specialty): with the prerequisite knowledge (how websites work and basic Python or JavaScript), you could have that automated in a weekend. You don't even have to deal with authentication - if you don't run it often, you can just log in manually, then run the script to collect data.
Alternatively, try calling you bank and asking you can get the statements emailed to you automatically. Most banks offer this in one way or another, although you usually get PDFs so some programming might still be necessary to transform the data.
I actually did write a program to deal with Paypal. I used it once a year to process the Paypal information. Which worked fine, until Paylap changed the data format, so I had to recode the program. Then next year they changed the format. I had to recode the program. Then next year they changed the format. I had to recode the program.
See where this is going? Spending a weekend automating a program that you use only once, and have to recode every year, does not work.
Also, quicken cannot import PDFs. As for getting the statements via email - it's just as much work getting them one by one out of my email and into the proper place in my accounts.
What I want:
Download in one operation ALL the data in one zip file.
Now that would save me significant time. I've never had an account that allowed this. They're all different, but all about 20 minutes or so of clickety-clickety-clickety.
BTW, at least they finally stopped downloading each statement with the same file name (named "download.pdf", naturally). So I had to rename it as an additional step for each statement. It lead me to wonder didn't anyone who worked at that bank on the web site have an account at the bank?
BTW, I have submitted suggestions to these banks on making this easy. They're all nice and polite, but they act astonished that anyone would want to download all the last tax year's information.
But nothing ever changed, so I stopped bothering them with suggestions.
I can’t remember the acronym right now (and am too lazy to search through my scripts to find it), but there is an older banking standard that can be used to download transactions.
Some regular banks support it, you can use your existing credentials, and there are Node.js libraries for it. Downside is the metadata is lean. I think there’s a ~10char limit on the description? Like you get “2021-11-01 14:35, LONGNAMECO, $500” and that’s it. Don’t quote me.
Totally agree. Voice assistants IMO are doing that thing that tech sometimes does: hinting at what could be great but failing to actually deliver anything useful.
One insanely simple thing that'd be useful to me: local tide times. Can Siri or Google do this for me? No, not a chance. Best I get is "here's a web page", ie just a higher friction version of the thing I could have done myself.
I'm ok turning my music up and down or choosing a song. Let's make these devices actually useful and then I might sit up and start using them.
Even Quicken finally added a "last tax year" on the menus.
BTW, Quicken is the most rube-goldberg-esque program I've ever encountered. There's no UI consistency anywhere, the screen constantly clears and flashes and redraws, some check boxes cannot be clicked on, you have to Alt-key them, some places Paste does not work, and it blocks you while it updates itself every week or so.
What a mess of a program. I bet it was written by squirrels and poodles.
fwiw that bothers me too. i don't know why bank websites suck so hard.
while we're at it, they should offer read only api keys for accessing such information.
ai assistants won't solve this.
> “Alexa, set lights to 40%”
> “Alexa, set an egg timer for 12 minutes”
> “Alexa, set an alarm for 6:45am”
> “Alexa, play Stories Podcast on Spotify”
It’s not that it looks useless. It’s sure looks pretty cool to be able to switch on the light with the sound of my voice.
But the privacy cost is extremely huge for something that isn’t that much revolutionary. I’ll just keep pressing the button. That’s less cool, I admit that and I’m even somewhat jealous. But I’m never going to allow Amazon (or any company) listen to me constant just not to press a button.
I respect and have no issue with your choice but it feels wrong justifying the utility without mentioning the drawbacks.
In the game Horizon: Zero Dawn (and now Horizon: Forbidden West) the main character Aloy uses a device called a Focus that let's her visualize - usually via a virtual heads-up-display - data about the world around her (things like detecting and tracking nearby objects, or reading data from and interacting with devices).
I don't think it's really a comparable to a voice assistant. It's more "Google Glass" than "Google Assistant".
I would prefer a voice, like the one in Her. Would be nice if it can predict what I need next and plan ahead. Smart enough, probably above Siri/Alexa/Cortana and below a person (hopefully).
This webpage is confusing. There's no scroll bar, and the "Get started" link is hidden until you scroll down about 75%, then in sticks to the top, unless you scroll back up.
So, scroll down a ways, then click "getting started" at the top right of the page, then there's a "quick start" guide.
It's self hosted, so you probably won't find an online demo. If you don't want to try it without an online demo, then this probably won't be that interesting for you anyways, since custom voice assistants necessarily require customization.
I’m interested in what platform you’re using, because I wasn’t aware that any made a visual distinction of the error they’ve made. Specifically, they have the root element hiding overflow (well, it’s #site-window that does the actual `overflow: hidden`, but it and its ancestors control their height to 100% of the viewport), and an element inside that (#site-main-views) being the scrolling element. I noticed this because keyboard navigation (Up/Down/PageUp/PageDown/Home/End) failed to work, since the initial containing block doesn’t have any scrolling, and you have to focus the scrolling element (e.g. with Tab or clicking in it) before keyboard navigation will work.
My mistake. Looking again, there's a very skinny (8 pixels wide) custom scrollbar that has no discernable "track" (#0D on #00 background). I just didn't notice it. I'm using Safari.
Hmm, interesting that they’re styling a viewport and inner-but-equivalently-sized scrollbar differently. Think I might see a Mac in a few weeks’ time, I’ll have to look into the rendering and differences.
The key component that is missing from many of these digital voice assistants is multi-listening device coordination: which device should answer if there are many devices listening.
It's a hard problem. From my understanding, homepods use bluetooth to coordinate who is going to answer. Google Home, at least in my home, is terrible at this problem. Often devices in another room will answer.
Once someone has solved this problem, I'm on board!
Even homepods get this wrong, and I'd be surprised if they only used bluetooth. I know they use wifi for presence (being on the same wifi is required for things like managing lists). Homepods also support UWB (you can bring your phone close to a homepod to transfer music playback to it, and it's used for setup). But I still get my living room homepod responding to me while I'm in the kitchen mere feet from the kitchen homepod with my phone in my hand (working on grocery list, for example).
Depending on the phone's proximity is still annoying, though. It seems weird that they don't seem to use voice proximity to determine which one is closer.
It's amusing that few people selling chatbots offer one you can just talk to about buying their product.
Most of those things are about one step above "press 1 for sales, press 2 for tech support". You just get to say "sales" instead of pressing 1. So if you want to convince people yours is better, a good demo is essential.
I'm still looking for a privacy / offline voice assistant solution, so I'll dig a bit on this.