Hacker News new | past | comments | ask | show | jobs | submit login

I've been extremely happy with their generated voice!

I had GPT read something to me in the car. My wife was nonplussed with my "audiobook" until I told her that you can talk to it like Siri. Then she called it "scary" haha




The voices sound great! But the latency is too high, and it's clunky to use with voice alone, it only listens to you at specific times and you can't interrupt it with your voice. I wanted something that felt more like having a casual conversation with a real person, instead of "Siri but smarter". And I know it's possible because I built something closer to what I want.


I’m really curious how much of the speech to text is happening on device, I’m fairly sure the answer is “none”. That would have a fairly immediate performance boost. Right now it is cool as hell, just way too slow to be truly useful.


Yeah I expect the speech recognition is happening in the cloud. But that doesn't mean it necessarily has to be slow. You can stream audio with very low latency on most connections. The problem is what they do with the audio once it gets to the server. I suspect it would be far too expensive to dedicate a GPU to each customer so they need to run everything in batch mode which increases latency.


Was this just ChatGPT or a custom implementation with access to some data?


Normal ChatGPT. I asked it a question about the Soviet Union, and the response was lengthy




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: