A tool for capturing captions and transcripts from online videos

culi · on Oct 1, 2022

Wow wish it could integrate with this:

I'm constantly finding myself remembering a part of a video I saw a long time ago (e.g. Robert Sapolsky's hour-long lectures on human behavioral biology) and I've wasted more than a few days just trying to figure out which of his two dozen lectures it was. Filmot solved this for me, but I still have this problem with badly captioned videos or non-yt videos. This tool seems perfect for building a personal, searchable collection

jackconsidine · on Sept 30, 2022

Wow that's cool. I've been using Whisper from a script I wrote which reads my Dropbox videos, transcribes them, and uploads both to Notion. If anyone's interested feel free to reach out. [0]

I may pivot to this Github Action so my CPU doesn't explode.

[0] jack at koptional dot com

muratsu · on Sept 30, 2022

why not put it on a gist/github repo and share it here?

mzs · on Sept 30, 2022

blog post with details: https://simonwillison.net/2022/Sep/30/action-transcription/

jamesblonde · on Sept 30, 2022

There is a severless machine learning course that includes GH actions to implement serverless feature pipelines and serverless batch inference pipelines.

https://github.com/featurestoreorg/serverless-ml-course

Disclaimer: I am involved in it.

genewitch · on Oct 1, 2022

youtube-dl has --embed-subs and --convert-subs (currently supported: ass, lrc, srt, vtt)

The automatic thing is interesting, and i'll have to check how to use whisper, i have a ton of old DVD rips that either have no subtitles or opensubtitles that have severe timing issues that make them unusable with a dumb player (like roku).

throwaway81523 · on Oct 2, 2022

I just use youtube-dl to get the youtube ttf captions and run them through a python script to clean up the timestamps and format somewhat readably. There are tons of transcription errors but it's still a lot faster than watching the stupid video.

l8rlump · on Oct 3, 2022

Would this give you a similar but quicker result?

https://you-tldr.com/

(No affiliation).

IYasha · on Oct 3, 2022

If it was a DirectShow filter, we could just connect: containder demuxer -> audio decoder -> transcriber -> translator -> subtitle renderer :)

sixhobbits · on Sept 30, 2022

any activity that places a burden on our servers, where that burden is disproportionate to the benefits provided to users (for example, don't use Actions as a content delivery network or *as part of a serverless application*, but a low benefit Action could be ok if it’s also low burden); or

Not a lawyer but pretty sure that is a violation of their ToS

simonw · on Sept 30, 2022

I'm very confident that what I've built here fits the set of things that you are allowed to do with Actions.

The workflow I've written here is a shortcut for writing content directly to the repository. You could go and run the commands on your laptop and copy-and-paste the extracted captions into a file and push them to the repo... but Actions are specifically designed to automate that kind of process.

(Also: I've shown this to GitHub people who have worked on Actions and they thought it was really cool.)

latchkey · on Sept 30, 2022

Being confident/cool is irrelevant if GH legal decides that this isn't a valid use of their ToS.

I would have reached out to GH to ask for permission instead of asking for forgiveness.

simonw · on Sept 30, 2022

I doubt GitHub have the support capacity to handle everyone pinging them to ask permission any time they want to do something interesting with Actions.

I'll take my chances. If they tell me it's not a supported use-case, I'll update the project to tell people they shouldn't use it.

fragmede · on Sept 30, 2022

Keep building and ignore the haters. I'm sure Github deals with actual abuse issues with Github Actions (like trying to mine crypto) on a regular basis. This is neat and interesting and at most they'll rate limit it if it gets too popular. Plus you're connecting to a hosted paid service for the GPU backend side so it's not all CPU time.

latchkey · on Sept 30, 2022

I'm not a hater, I'm a realist. Services like this have a free tier to encourage paid accounts. When people abuse that free tier, everyone else suffers. It is not much effort to ask the support team for permission. I've also been on the devops team of having to run services like this and it really isn't fun when people abuse it. It is a lot of extra work.

latchkey · on Sept 30, 2022

Exactly, they would probably say no since that is the easiest answer.

Now that you're top of HN and they might see more abuse of their systems, it'll just come more quickly.

Great work on the actions though, it is a pleasure to read the source code. Learning a few tricks in there.

striking · on Sept 30, 2022

I think

> if using GitHub-hosted runners, any other activity unrelated to the production, testing, deployment, or publication of the software project associated with the repository where GitHub Actions are used.

is far more pertinent, and can be solved by self-hosting a runner.

mmastrac · on Sept 30, 2022

Can you self-host a runner outside of GH enterprise?

EDIT: TIL you can! That's wild.

lee101 · on Sept 30, 2022

Also check out https://text-generator.io which is over 8x cheaper than Google for speech to text, 5.5 hours free every month too

mtlynch · on Oct 1, 2022

You should disclose that this is your business, as the wording makes it seem like an unbiased recommendation.

jimmySixDOF · on Oct 1, 2022

In the repo he says $0.20/min which seems quite high to me even roll your own like this. But I noticed that Otter.ai have downscaled their paid and free tier audio>txt min/month allotments as of last week so what used to get you 6000 now gets you 1200. They also capped the video file transcriptions so I wonder if costs are going up for some reason?

mmastrac · on Sept 30, 2022

This is an amazing "misuse"/hack of GitHub Actions and probably something that will cause major headaches for us in the future if they decide to crack down on it. I love it.

swyx · on Sept 30, 2022

why is it a misuse?

mmastrac · on Sept 30, 2022

See the above comment: https://news.ycombinator.com/item?id=33037494

Kinda breaks the spirit of GHA, IMO. I like it, but I think it's a bad path to start down. Entirely IMO, keep in mind.

simonw · on Sept 30, 2022

This is the wrong link - this is just to a demo of the system.

https://simonwillison.net/2022/Sep/30/action-transcription/ is my full write-up of the project

https://github.com/simonw/action-transcription is the project repository.

dang · on Sept 30, 2022

Ok, we've changed to the first link from https://github.com/simonw/action-transcription-demo. Thanks!

simonw · on Sept 30, 2022

Thanks!

mzs · on Sept 30, 2022

Sorry I can't seem to edit the submission anymore, but you edited the readme thankfully.