I've been slowly working on my own simple home "Alexa" using mostly CMUSphinx fo...

canada_dry · on March 16, 2017

> I've got the project on hold for now because I can't find a decent, non-commercial way of converting voice to text. I'd really rather not send my audio out to Amazon/Google/MS/IBM

Same concern here... so my voice->text method is via android's google voice - forced to offline mode. The offline mode is surprisingly good.

Re mis triggers... I also have opencv running on the same android. It only activates the voice recognition when I am actually looking directly at the android device (an old phone).

squeaky-clean · on March 16, 2017

> text method is via android's google voice - forced to offline mode. The offline mode is surprisingly good.

I actually tried this at one point with a wall-mounted tablet before trying Sphinx. It is surprisingly good for offline, probably the best offline I've tried yet outside of dedicated software like Dragon. But it doesn't meet my open criteria, so I'm hoping to find something better.

I'll most likely give up on the requirements of it needing to be local and open, and use Sphinx for hotword detection to send the audio out to AWS for processing.

> Re mis triggers... I also have opencv running on the same android. It only activates the voice recognition when I am actually looking directly at the android device (an old phone).

That's an awesome idea :) I haven't gotten around to playing with anything vision based yet. But I've thought of 'simple' projects like that, which would add a lot to the perceived intelligence. Figuring out the number of people in a room would be another useful idea I think. The AI could enter a guest mode when there is more than 1 person in the room, or when it detects faces that aren't mine, or something similar.

canada_dry · on March 16, 2017

> doesn't meet my open criteria

With the leaps and bounds being made in ml these days it can't be long before magnitudes better open source voice recognition becomes available. I gave Sphinx a try but it was horribly disappointing.

For me, the combination of google voice (offline) and Ivona voice (Amy) is pretty damn good for my android/python/arduino based home AI.

detaro · on March 17, 2017

Sounds interesting, do you have a writeup or some other details somewhere? (How do you force android voice recognition to work offline? Just block the phone from the internet?)

woodson · on March 16, 2017

Kaldi is not a point-and-click solution, it's a toolkit to develop your own speech recognition system. That said, it makes it incredibly easy if you know what you're doing, as it brings all the necessary tools and even provides some data to train your models (see the associated resources at http://openslr.org/). It's performance is state of the art.

detaro · on March 16, 2017

This was recently mentioned on HN, but I haven't really looked into it (apparently requires training your own models, but provides prepared scripts to do that for some common datasets): https://github.com/mozilla/DeepSpeech

squeaky-clean · on March 16, 2017

Must have slipped past me last time it was posted on HN. Thanks for sharing! I'm going to add this to my list of things to try next time I'm inspired to work on this project again.

detaro · on March 17, 2017

It was only mentioned in a comment. I just checked, since it never had a submission on its own I submitted it now.