This is off topic, but in regards to all the latest open AI news, including the ChatGPT and Whisper API releases. I came across Gladia.io and I see made a comment regarding it
"Why not use Whisper directly? All that seems to be happening is gladia.io is running 120 concurrent calls to openAI using 120 30s chunks of an hour long audio.
So yeah, you do get a speedup! Chop audio and stitch transcripts. But OP is vaguely (and briefly) promising a breakthrough of some sorts."
How did you figure out that is what they are doing? Or is this hypothetical?
You refer to a comment I made? It was hypothetical based on whisper.cpp notes regarding 30s max chunk limit, how long that takes, and noting that the latency speedup (x120) corresponded to exactly 120 concurrent 30s chunks vs serially transcribing 1 hour of audio.
Yeah, I was referring to the comment you made, was just curious about them, and wanted to confirm to know if they were just making concurrent calls or actually doing some novel optimization under the hood.
I do not think they were sending concurrent chunks to Open AI because the API wasn't out when they launched. That being said, there is some reduction in their accuracy compared to the original whisper, which I imagine they sacrificed to achieve such performance gains.
Obviously it's just concurrent calls to a model that has a 30s window. x120 performance breakthrough by in voice recognition, exactly a multiple of 1 hr / 30s.
I did not say anything about openAI API calls. Neither did they in their post. The mention openAI whisper "model".
"Why not use Whisper directly? All that seems to be happening is gladia.io is running 120 concurrent calls to openAI using 120 30s chunks of an hour long audio. So yeah, you do get a speedup! Chop audio and stitch transcripts. But OP is vaguely (and briefly) promising a breakthrough of some sorts."
How did you figure out that is what they are doing? Or is this hypothetical?