I like this. I tried something similar ~10 years ago, but it didn't go very well. I'm sure an LLM can do much better than the nonsense I hacked together.
Very cool, just signed up. What advantages does this have over the one built into the ChatGPT app? Also, it would be great if I could see the text output in addition to the voice.
The main differences fundamentally come down to OpenAI treating it more like a party trick demo, rather than a core functionality. I think it has a lot of potential if I can just fine tune a couple rough edges. (When you chat with someone in person, you don't pull out notebooks a write messages to each other. I see writing as a fallback medium.)
To answer your question more specifically,
Pro Bonamiko:
- Faster average first response latency (but higher first audio latency since OpenAI uses a ding). This is the main focus currently, reducing latency as much as I can. I'd like to be able to avoid the ding, but we'll see how low I can get it.
- Can be used anywhere with a browser, OpenAI requires a mobile app installed. (I.E. Desktop support)
- In the future we can support deeper customization since we are focused on the audio medium. As soon as you have to run a function in the ChatGPT app there is a long response latency, which could easily be fixed by something as simple as the AI saying "Let me perform a search to get the details"
Pro ChatGPT:
- Nice animation
- Already has built in tool support such as web search
- Supports language switching automatically between messages, Bonamiko requires manually changing the language
IP -> Intellectual Property, LUT -> Look Up Table. IP is often used by Intel to describe a "unit" of propriety technology, such a chip or just a adder design. Look up tables are in essence ROM that is the basis for how FPGAs work.