I don't understand why this doesn't actually do the transcription / translation ...

cloudking · on March 8, 2023

Here is a version that runs locally with WASM https://whisper.ggerganov.com/

https://github.com/ggerganov/whisper.cpp/tree/master/example...

paxys · on March 8, 2023

Because that would involve actual work. A box sitting somewhere that passes through API calls to OpenAI is trivial to set up.

Kevin2379 · on March 18, 2023

To do Whisper transcription for free locally you can use AirCaption (www.aircaption.com). It's an electron desktop app running Whisper.cpp (https://github.com/ggerganov/whisper.cpp). Just released a few days ago.

Madane · on March 9, 2023

Try Revoldiv.com , it uses Whisper. The transcription quality is near perfect and it's free.

MollyRealized · on March 11, 2023

With your references to money and paying, where exactly does this indicate it's charging? As of 3/11 Sat 16:46 CST I see no reference to that.

corobo · on March 8, 2023

How's it handling long files? Let's say worst case scenario, a 2 hour long podcast.

What ratio are you getting (podcast length to transcription time) and does it error out memory wise as others suggest?

masukomi · on March 8, 2023

I dunno about openAI as a service, but on my M1 mac i think whisper took something on the order of 8x realtime to process with the "large" language model. That is to say... 8 minutes of processing for every 1 minute of audio. It was surprisingly not fast. I assume openAIs servers have more GPU at their disposal to make this go faster.

johtso · on March 17, 2023

Are you using whisper.cpp? You really want to be using that if you care about speed. You should be able to get better than real-time transcription on an M1.

corobo · on March 9, 2023

Well that'd be why it didn't come with local transcription out of the box then. People would have called it shite!

I can edit a podcast twice as fast, never mind transcribe it! Using API calls seems like it was the best method for launch.

joshspankit · on March 9, 2023

Seems like “local API” would help here: just something that duplicates the official API while running locally

smooke · on March 8, 2023

Good points.

EForEndeavour · on March 8, 2023

I agree with you, but the reason is cost and convenience.

Whisper v2 costs $0.006 per minute of transcribed text: https://openai.com/pricing

If you had meetings every working hour, you'd have up to ~160 hours of audio per month to transcribe. For most people, this is a gross overestimate.

Throwing this audio at OpenAI's API would cost $57.60 per month, and also frees you up from having to set up and maintain local inference.

masukomi · on March 8, 2023

"cost and convenience": cost: $57.60 vs 0 Why would you want to pay nearly $700 a year just to avoid running a program in the background on whatever computer you already have open?

convenience: yes, it's a nicer interface, but the current state of the "geeky" version is type command on command line, with path to file. The end. unless you're really afraid of the command line it's not that much more convenient.

The text line being highlighted while you listen is nice but a) we wrote something that did it at the word level (as opposed to sentence..ish level) nearly 20 years ago, b) in this context it's not actually that useful. With video sure... you can click the text and go to teh right place in the video. With spoken text (what this is best at) you click and go to the point...where they're saying what you just read. Unless you really want to hear what you just read, there's not a lot of added value.

Would it be good for podcasts to use an interface like this for playback? absolutely. It'd be a massive upgrade, but that's not what this is offering.

maybe someone will extract that code and let us combine the MP3 and timestamped text file in a web site (if that doesn't already exist). That'd be cool.

But, the cost you propose is way too much for most people, especially in countries that aren't rich. In many places $400 a month is a really good salary. So yeah, if you're rich $700 a year is not a big deal, but...

EForEndeavour · on March 8, 2023

First, this $57.60 is a VERY pessimistic upper limit. Remember, that's based on having to transcribe every working hour of every day. The number of hours/month required for transcription is probably pareto-distributed among the workforce. I'd bet 90% of people would need to transcribe up to ~4 hours (1 important 1-hour meeting per week), corresponding to an API cost of USD$1.44 per month.

Second, don't underestimate the business value of a nice interface. IMO, the value of excellent UI/UX is part of why ChatGPT took off the way it did. The number of people willing to pay a few dollars per month in order to never have to see a command line is quite a bit larger than the number of people willing to host their own `whisper-large` inference.

Speaking of hosting, do you already own hardware that supports sufficiently fast inference? If not, how much would a good enough cloud instance cost you per month? It depends on how fast is fast enough, but more than $0, that's for sure.