Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yea, I use a cheapie voice recorder that only saves .wav files for ~10 memos per day, and Whisper transcripts are good. "Tiny" model, 4GB ram laptop. "Base" model runs too, but slower, and produces different inaccuracies.

But overall, if I were suggest an ideal process: 1) transcribe notes w/ Whisper, 2) play back the media in VLC with the transcripts and correct the errors. T = 16 hours of proofing/correction + ~8 hours of headless transcription of *.wav before hand.



I’d add that I had better luck using smaller chunks (about 20 seconds) per wav file for accuracy. Whisper seems to go berserk if you pump in lengthy audio (30+ seconds).

I’d be tempted to at least try breaking down the notes into one line long images (about a sentence) each and give it ago with Gemini. I haven’t tested their ocr, but even if it has errors, I bet you could just ask Gemini again to best fix the sentence.


Whisper works on 30s chunks iirc. You need to use something that's automatically splitting up your input if it's longer.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: