I'm constantly finding myself remembering a part of a video I saw a long time ago (e.g. Robert Sapolsky's hour-long lectures on human behavioral biology) and I've wasted more than a few days just trying to figure out which of his two dozen lectures it was. Filmot solved this for me, but I still have this problem with badly captioned videos or non-yt videos. This tool seems perfect for building a personal, searchable collection
Wow that's cool. I've been using Whisper from a script I wrote which reads my Dropbox videos, transcribes them, and uploads both to Notion. If anyone's interested feel free to reach out. [0]
I may pivot to this Github Action so my CPU doesn't explode.
There is a severless machine learning course that includes GH actions to implement serverless feature pipelines and serverless batch inference pipelines.
youtube-dl has --embed-subs and --convert-subs (currently supported: ass, lrc, srt, vtt)
The automatic thing is interesting, and i'll have to check how to use whisper, i have a ton of old DVD rips that either have no subtitles or opensubtitles that have severe timing issues that make them unusable with a dumb player (like roku).
I just use youtube-dl to get the youtube ttf captions and run them through a python script to clean up the timestamps and format somewhat readably. There are tons of transcription errors but it's still a lot faster than watching the stupid video.
any activity that places a burden on our servers, where that burden is disproportionate to the benefits provided to users (for example, don't use Actions as a content delivery network or *as part of a serverless application*, but a low benefit Action could be ok if it’s also low burden); or
Not a lawyer but pretty sure that is a violation of their ToS
I'm very confident that what I've built here fits the set of things that you are allowed to do with Actions.
The workflow I've written here is a shortcut for writing content directly to the repository. You could go and run the commands on your laptop and copy-and-paste the extracted captions into a file and push them to the repo... but Actions are specifically designed to automate that kind of process.
(Also: I've shown this to GitHub people who have worked on Actions and they thought it was really cool.)
I doubt GitHub have the support capacity to handle everyone pinging them to ask permission any time they want to do something interesting with Actions.
I'll take my chances. If they tell me it's not a supported use-case, I'll update the project to tell people they shouldn't use it.
Keep building and ignore the haters. I'm sure Github deals with actual abuse issues with Github Actions (like trying to mine crypto) on a regular basis. This is neat and interesting and at most they'll rate limit it if it gets too popular. Plus you're connecting to a hosted paid service for the GPU backend side so it's not all CPU time.
I'm not a hater, I'm a realist. Services like this have a free tier to encourage paid accounts. When people abuse that free tier, everyone else suffers. It is not much effort to ask the support team for permission. I've also been on the devops team of having to run services like this and it really isn't fun when people abuse it. It is a lot of extra work.
> if using GitHub-hosted runners, any other activity unrelated to the production, testing, deployment, or publication of the software project associated with the repository where GitHub Actions are used.
is far more pertinent, and can be solved by self-hosting a runner.
In the repo he says $0.20/min which seems quite high to me even roll your own like this. But I noticed that Otter.ai have downscaled their paid and free tier audio>txt min/month allotments as of last week so what used to get you 6000 now gets you 1200. They also capped the video file transcriptions so I wonder if costs are going up for some reason?
This is an amazing "misuse"/hack of GitHub Actions and probably something that will cause major headaches for us in the future if they decide to crack down on it. I love it.
https://filmot.com/
I'm constantly finding myself remembering a part of a video I saw a long time ago (e.g. Robert Sapolsky's hour-long lectures on human behavioral biology) and I've wasted more than a few days just trying to figure out which of his two dozen lectures it was. Filmot solved this for me, but I still have this problem with badly captioned videos or non-yt videos. This tool seems perfect for building a personal, searchable collection