Hacker News new | past | comments | ask | show | jobs | submit login

There was Adobe Voco, which seemed kind of a forerunner: https://www.youtube.com/watch?v=I3l4XLZ59iw, https://en.wikipedia.org/wiki/Adobe_Voco. It purportedly could edit speech like an audio editor and looked like a destroyer of authenticity. And then nothing was heard of it anymore.

(Edit: Wikipedia says that VoCo takes “approximately 20 minutes of the desired target's speech”, and that it was a research prototype.)

There was a thing called Tacotron from a team at Google, in 2018: https://google.github.io/tacotron/publications/speaker_adapt... (In fact, the OP repo and the original CorentinJ/Real-Time-Voice-Cloning apparently rely on Tacotron.)

And there was something from 2019: https://www.ohadf.com/projects/text-based-editing/, https://news.stanford.edu/2019/06/05/edit-video-editing-text...

The latter two seem to need more samples than pure real-time editing.

Overall, to me a layman, this space appears quieter than ‘deep-faking’ videos. Which makes me wonder if I haven't missed something.




I imagine that it's inherently easier to create a fake voice that's convincing over a grainy phone line, than it is to make a convincing deep fake video.

Maybe big tech orgs(including Adobe) don't want to risk the liability/PR fallout.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: