> Speech recognition is useful. Now try to mute a video on youtube and understan...

tkgally · 2024-08-21T04:29:44 1724214584

For some reason, YouTube is not using a very good STT system now. The lack of sentence punctuation is particularly annoying. Transcriptions by Whisper and Gemini 1.5 Pro are much better. From a couple of weeks ago:

https://news.ycombinator.com/item?id=41199567#41201773

I expect that YouTube will up their transcription game soon, too.

LtWorf · 2024-08-21T13:28:26 1724246906

I've tried whisper too. I made this: https://codeberg.org/ltworf/srtgen

Basically it's kinda useful to put time tags, but I need to manually fix each and every sentence. Sometimes I need to fix the time tags as well.

I just spoke about youtube because it's more popular and easy to test.

Terr_ · 2024-08-21T01:50:44 1724205044

Sometimes speech-to-text machine learning models give very good results, however I think the key is that:

1. It's overwhelmingly more useful than the [no text] it was replacing, particularly for the deaf or if you want to search for keywords in a video.

2. When it fails, it tends to do so in ways that trigger human suspicion and oversight.

Those aren't necessarily true of some of the things people are shoehorning LLMs into these days, which is why I'm a lost more pessimistic about that technology.

KoolKat23 · 2024-08-20T22:27:48 1724192868

Just today, I received a note from a gas technician, part handwritten, for the life of me I couldn't make out what he was saying, I asked ChatGPT and it surprisingly understood, rereading the original note I'm very sure it was correct.