Ah, the good old "you're holding it wrong". What good is a speech recognition to...

zettabomb · 2025-07-22T10:27:23 1753180043

Considering that if you DO use VAD (voice activity detection), it's the best open weights voice recognition model by a very wide margin, it's quite good. I'd be willing to be that commercial products that "don't have this problem" are using VAD as well, and that this is well known to them. But Whisper is just the weights, and I suppose a simple reference implementation, not a full product.

bmacho · 2025-07-22T10:46:58 1753181218

> What good is a speech recognition tool that literally hears imaginary voices?

Well, if it is supposed to work after silence detection, then it is good for speech recognition I guess. It's like blaming a wheel why is it circular, you can't sit on it. It's a part of a larger machine.

dumbfounder · 2025-07-22T13:05:06 1753189506

Just lay the wheel on its side and it makes a fine seat.

nhecker · 2025-07-22T16:50:26 1753203026

>imaginary voices

On the other hand, I can imagine that when things get quiet and the signal-to-noise ratio gets close to zero, random background audio (or randomness introduced in the transcription model) will be enough to tickle a critical number of neurons and elicit hallucinations.

The related thought exercise is this: Try scanning across the band with an AM or sideband radio, and after a while your brain will start to wonder "was that a voice I just heard, or music perhaps?" when in reality it was just environmental static.

wahnfrieden · 2025-07-22T11:28:06 1753183686

Yes, you are holding it wrong. The good of it is that it does not output imaginary voices when used with VAD.

Show us a technology with better results that does not use VAD. If you can’t, then I’m not sure what you’re arguing against except superficialities so inconsequential that I can’t comprehend the condescension. The results speak for itself

Xmd5a · 2025-07-22T09:05:46 1753175146

faster-whisper has a min_silence_duration_ms option

wahnfrieden · 2025-07-22T11:29:37 1753183777

There are much higher quality VAD solutions available

DANmode · 2025-07-23T23:30:26 1753313426

Please name a couple to get someone started who's hacking on webapps?

I'd really appreciate it.

DANmode · 2025-07-24T03:23:57 1753327437

(as would future readers, I'm sure)

DANmode · 2025-07-24T05:50:36 1753336236

https://github.com/ten-framework/ten-vad

wahnfrieden · 2025-07-24T08:15:10 1753344910

I last used silero but haven’t kept up with stage of the art so didn’t mention it

xandrius · 2025-07-22T11:52:39 1753185159

So if a tool has a process to have it perform at its best then it's a problem?

Do you also moan that before applying glue to a surface or it won't stick? Or if you need to drill a guiding hole before making a larger one in wood? Or that you need to use truly prime numbers for a security key to actually be safe?