Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Looking for a 24-7 Real-Time Voice Transcription Tool
11 points by 8ta4 on Sept 17, 2023 | hide | past | favorite | 14 comments
I'm on the hunt for a voice transcription tool that can operate continuously, in real-time, 24/7, to enhance my workflow. I need something that doesn't require constant starting and stopping.

I've looked into a few options, but none of them seem to provide the non-stop, real-time transcription I'm after:

- Siri/Google Assistant: They're great for short dictations, but they don't offer continuous transcription. - Otter.ai: This tool provides real-time transcription for meetings and interviews, but using it 24/7 could get expensive. - Rev Voice Recorder: It lacks real-time capabilities and needs to be manually activated. - The NSA: I won't be able to pass their security clearance because of that "classified" recording of my ex in bed... snoring like a freight train all night.

Before I decide to build or modify a tool myself, I thought I'd ask here:

Does anyone know of a tool that can provide 24/7 transcription?

I've started sketching out some initial ideas here:

https://github.com/8ta4/say

But my main goal is to find out if such a tool already exists so I don't end up reinventing the wheel.




I'm interested in the same thing and have spent quite a bit of time looking.

Rewind.ai is ok (transcription accuracy is meh)

Voice Memos.app is ok (though no native transcription, and requires stopping and starting)

Otter.ai is ok (though there's a 4 hour limit on recordings, and there's no paid plan that allows for enough recording minutes to do 24/7)

My ideal solution would be that Otter comes out with a Pro 24/7 plan with 60,000 minutes per month and no max recording length, for $60-80/mo.

I would pay for this and have paid for alternatives, though I'd prefer to use an existing company that I've used for a while and that has lots of users, due to privacy/trust, or perhaps a small startup that publishes security reports and does everything on device.

As an aside:

I use 24/7 voice transcription as a kind of "extended context window" (to use an LLM analogy). While I'm working, I talk out loud to myself about what I'm thinking through, which I find allows me to effectively increase my working memory size to be much larger than otherwise. It's quite helpful.


Do you think an open-source solution that only uses Deepgram API and does not store any recordings would satisfy your privacy requirements?

How many hours per day or month do you actively use speech recognition?

60,000 minutes per month. I had to double-check my calculations. It seems you've found a 30th hour in your day.

Let me give you some context:

I saw your blog post about Deepgram. They charge $0.0059 per minute for pay-as-you-go.

- If you use it 24/7, it costs:

    - $8.496 per day

    - $254.88 per month
- If you use it 8 hours a day (with voice activity detection), it costs:

    - $2.832 per day

    - $84.96 per month
I know the 24/7 cost is too high for your budget ($60-80 a month). But voice activity detection can save you a lot of money.

About privacy and trust, open-sourcing the solution might give you some confidence. Deepgram is backed by YC and has many users, which might also make you feel better.


Out of curiosity, what do you then do with the transcriptions? You said you typically talk out loud while working, but do you continue working like normal after the transcription is recorded for later use, or do you interrupt your workflow to do something specific with the transcription immediately after?


> Out of curiosity, what do you do with the transcriptions after you record them?

I use them as a dictation tool. I speak out what I want to write and then I use a language model to polish it later.

> You mentioned that you usually talk out loud while working, but do you keep working as usual after you save the transcription for future use?

Yes, I continue working as "normal". But you see, there's this slight concern that if I keep talking all day, every day, someone might reserve a spot for me in a mental asylum.

> Do you ever stop your work to do something with the transcription right away?

Sometimes, yes. If I'm writing a specific message, I might pause my work to polish it immediately. But if I'm just voicing my random thoughts, I would like to access them later to write my messages or posts.


> Before I decide to build or modify a tool myself, I thought I'd ask here:

if you do decide to, start with ggml/whisper


https://github.com/abb128/LiveCaptions comes to mind: The libraries and models for it are easily available for reworking it to be how you want, and can run 24/7 if you don't mind the cpu usage


Thank you! With the CPU usage, it seems like every season will be summertime.


Do you care about what platform the solution is on? I.e Linux/Windows desktop application, Chrome extension, mobile app, website in a separate tab, etc.?


Not seeing Mac really caught my eye! You see, Mac is the apple of my eye.

I want to try a standalone Mac app for the trial.

After the trial, I'll use a stationary device that can record all the time. It could be a Raspberry Pi, an old phone, or something cheap. I don't want to miss anything or record other people's talk.

I'll edit on my main Mac. I can't use it for recording because it's not always on and I might take it out.

For syncing, I prefer a cloud service that can transcribe well. The device will send my audio to the cloud and the transcription API will do its magic. The transcript will be ready on my Mac right away.


Can you just use the speech recognition built into your OS? I think macos, windows, android, and iOS all have that built in these days?


Can Windows, Android, or iOS dictation run continuously 24/7?

Actually, I'm using the built-in macOS speech recognition to answer your que


I don't know if a built in app does that, but if not, it should be possible to write one and use the OS APIs to do the actual recognition?


"Why didn't I consider it earlier?" was my first thought. Then, "Maybe I talk too much and think too little?"

Your idea is definitely easy on the wallet, but I should've mentioned before that accuracy is crucial.

When it comes to speech recognition accuracy:

- Windows might work well, since Office 365 has good speech recognition features.

- The speech recognition on macOS isn't as accurate, and iOS might have the same problem.

- As for Chrome and Google Docs, their speech recognition quality is lower, and Android might be similar.


You can do this via command line with whisper.cpp.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: