Unless you fetch directly from your browser. It works by getting the YouTube json including the captions track. And then you get the baseUrl to download the xml.
I wrote this webapp that uses this method: it calls Gemini in the background to polish the raw transcript and produce a much better version with punctuation and paragraphs.
looks great. I made a similar app called Scribe where you can highlight passages of the transcript.
It's working on the web but also as an iOS app.
https://www.appblit.com/scribe
To solve the server IP sometimes being blocked by YouTube, the app fetches the transcripts in the browser.