Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Recently, I was working on a similar project and I found that grabbing the transcripts quickly leads to your IP being blocked for the transcripts.

I ended up doing the same as this person, downloading the MP4s and then transcribing myself. I was assuming it was some sort of anti LLM scraper feature they put in place.

Has anyone used this --write-auto-subs flag and not been flagged after doing 20 or so videos?



—-write-auto-subs gets your IP banned for 12/24 hours if you download video subtitles in bulk but if the subtitles are downloaded with sufficient time gap in between, the ban is not triggered.

My startup has to utilize youtube transcriptions so we just subscribe to a youtube transcriptor api hosted on rapidapi that downloads subtitles. 1$ per 1000 reqs. Pretty cheap


Yep, this happened to me & got IP banned for a day.


    systemctl start tor
    yt-dlp --proxy socks5://127.0.0.1:9050 --write-subs --write-auto-subs --skip-download [URL]
See: https://github.com/noobpk/auto-change-tor-ip


Unless you fetch directly from your browser. It works by getting the YouTube json including the captions track. And then you get the baseUrl to download the xml.

I wrote this webapp that uses this method: it calls Gemini in the background to polish the raw transcript and produce a much better version with punctuation and paragraphs.

https://www.appblit.com/scribe

Open source with code to see how to fetch from YouTube servers from the browser https://ldenoue.github.io/readabletranscripts/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: