Ask HN: Looking for a 24-7 Real-Time Voice Transcription Tool

tikkun · on Sept 17, 2023

I'm interested in the same thing and have spent quite a bit of time looking.

Rewind.ai is ok (transcription accuracy is meh)

Voice Memos.app is ok (though no native transcription, and requires stopping and starting)

Otter.ai is ok (though there's a 4 hour limit on recordings, and there's no paid plan that allows for enough recording minutes to do 24/7)

My ideal solution would be that Otter comes out with a Pro 24/7 plan with 60,000 minutes per month and no max recording length, for $60-80/mo.

I would pay for this and have paid for alternatives, though I'd prefer to use an existing company that I've used for a while and that has lots of users, due to privacy/trust, or perhaps a small startup that publishes security reports and does everything on device.

As an aside:

I use 24/7 voice transcription as a kind of "extended context window" (to use an LLM analogy). While I'm working, I talk out loud to myself about what I'm thinking through, which I find allows me to effectively increase my working memory size to be much larger than otherwise. It's quite helpful.

8ta4 · on Sept 17, 2023

Do you think an open-source solution that only uses Deepgram API and does not store any recordings would satisfy your privacy requirements?

How many hours per day or month do you actively use speech recognition?

60,000 minutes per month. I had to double-check my calculations. It seems you've found a 30th hour in your day.

Let me give you some context:

I saw your blog post about Deepgram. They charge $0.0059 per minute for pay-as-you-go.

- If you use it 24/7, it costs:

    - $8.496 per day

    - $254.88 per month

- If you use it 8 hours a day (with voice activity detection), it costs:

    - $2.832 per day

    - $84.96 per month

I know the 24/7 cost is too high for your budget ($60-80 a month). But voice activity detection can save you a lot of money.

About privacy and trust, open-sourcing the solution might give you some confidence. Deepgram is backed by YC and has many users, which might also make you feel better.

ginkoutest · on Sept 17, 2023

Out of curiosity, what do you then do with the transcriptions? You said you typically talk out loud while working, but do you continue working like normal after the transcription is recorded for later use, or do you interrupt your workflow to do something specific with the transcription immediately after?

8ta4 · on Sept 17, 2023

> Out of curiosity, what do you do with the transcriptions after you record them?

I use them as a dictation tool. I speak out what I want to write and then I use a language model to polish it later.

> You mentioned that you usually talk out loud while working, but do you keep working as usual after you save the transcription for future use?

Yes, I continue working as "normal". But you see, there's this slight concern that if I keep talking all day, every day, someone might reserve a spot for me in a mental asylum.

> Do you ever stop your work to do something with the transcription right away?

Sometimes, yes. If I'm writing a specific message, I might pause my work to polish it immediately. But if I'm just voicing my random thoughts, I would like to access them later to write my messages or posts.

wyldfire · on Sept 17, 2023

> Before I decide to build or modify a tool myself, I thought I'd ask here:

if you do decide to, start with ggml/whisper

its-summertime · on Sept 17, 2023

https://github.com/abb128/LiveCaptions comes to mind: The libraries and models for it are easily available for reworking it to be how you want, and can run 24/7 if you don't mind the cpu usage

8ta4 · on Sept 17, 2023

Thank you! With the CPU usage, it seems like every season will be summertime.

ginkoutest · on Sept 17, 2023

Do you care about what platform the solution is on? I.e Linux/Windows desktop application, Chrome extension, mobile app, website in a separate tab, etc.?

8ta4 · on Sept 17, 2023

Not seeing Mac really caught my eye! You see, Mac is the apple of my eye.

I want to try a standalone Mac app for the trial.

After the trial, I'll use a stationary device that can record all the time. It could be a Raspberry Pi, an old phone, or something cheap. I don't want to miss anything or record other people's talk.

I'll edit on my main Mac. I can't use it for recording because it's not always on and I might take it out.

For syncing, I prefer a cloud service that can transcribe well. The device will send my audio to the cloud and the transcription API will do its magic. The transcript will be ready on my Mac right away.

solardev · on Sept 18, 2023

Can you just use the speech recognition built into your OS? I think macos, windows, android, and iOS all have that built in these days?

8ta4 · on Sept 19, 2023

Can Windows, Android, or iOS dictation run continuously 24/7?

Actually, I'm using the built-in macOS speech recognition to answer your que

solardev · on Sept 19, 2023

I don't know if a built in app does that, but if not, it should be possible to write one and use the OS APIs to do the actual recognition?

8ta4 · on Sept 20, 2023

"Why didn't I consider it earlier?" was my first thought. Then, "Maybe I talk too much and think too little?"

Your idea is definitely easy on the wallet, but I should've mentioned before that accuracy is crucial.

When it comes to speech recognition accuracy:

- Windows might work well, since Office 365 has good speech recognition features.

- The speech recognition on macOS isn't as accurate, and iOS might have the same problem.

- As for Chrome and Google Docs, their speech recognition quality is lower, and Android might be similar.

noman-land · on Sept 17, 2023

You can do this via command line with whisper.cpp.