For some reason, YouTube is not using a very good STT system now. The lack of sentence punctuation is particularly annoying. Transcriptions by Whisper and Gemini 1.5 Pro are much better. From a couple of weeks ago:
Sometimes speech-to-text machine learning models give very good results, however I think the key is that:
1. It's overwhelmingly more useful than the [no text] it was replacing, particularly for the deaf or if you want to search for keywords in a video.
2. When it fails, it tends to do so in ways that trigger human suspicion and oversight.
Those aren't necessarily true of some of the things people are shoehorning LLMs into these days, which is why I'm a lost more pessimistic about that technology.
Just today, I received a note from a gas technician, part handwritten, for the life of me I couldn't make out what he was saying, I asked ChatGPT and it surprisingly understood, rereading the original note I'm very sure it was correct.
Now try to mute a video on youtube and understand what's being said from the automatic subtitles.
If you do it in english, be aware that it's the best performing language and all others are even worse.