AudioGen: Textually Guided Audio Generation

solardev · on Sept 30, 2022

The last thing you'll hear before the AI eats you: https://felixkreuk.github.io/text2audio_arxiv_samples/large_...

iamthemonster · on Oct 1, 2022

It would be very interesting indeed to have an ebook reader paired with bluetooth earphones, and it simultaneously feeds the words into this to make an ambient soundtrack, perhaps also choosing music appropriate to the word-choice on the page.

nudpiedo · on Sept 30, 2022

That could be another missing piece to videogame generational art, sfx sounds and soon soundtracks.

kevmo314 · on Sept 30, 2022

The speech samples are really funny. Very Sims-esque.

jnovek · on Oct 1, 2022

I found them very unsettling. My brain is trying so hard to resolve words from that mess. This is the first time I’ve really thought about how the uncanny valley applies to spoken words.

thfuran · on Oct 1, 2022

The valley has no end https://youtu.be/Vt4Dfa4fOEY

karmasimida · on Sept 30, 2022

It will be more useful if it can narrate text along with those background effects.

simonw · on Sept 30, 2022

You can already achieve that by combining models - use a dedicated speech synthesis model for the narration, then layer that over background effects from AudioGen.

Given that, I don't think AudioGen particularly needs to add full narration. That seems like a very different problem to me, likely requiring a completely different architecture.

godmode2019 · on Oct 1, 2022

What is the current state of the art speech synthesis model?

ricopags · on Oct 3, 2022

It was Nvidia's Tacotron2[0] but now I believe it's NaturalSpeech[1]

[0]https://paperswithcode.com/method/tacotron-2 [1]https://speechresearch.github.io/naturalspeech/

godmode2019 · on Oct 16, 2022

Thank you

youssefabdelm · on Oct 1, 2022

-__- I wish researchers would train a stereo 44.1kHz version...why always 16kHz? I know I know 16kHz saves more compute but come ooooon you're Meta

fragmede · on Oct 3, 2022

Text2audio is impressive, but I wanna see dance2audio. Just need a million dollars in funding to pay for cameras and dancers.

fuzzythinker · on Sept 30, 2022

[code] redirects to the same page

ggerganov · on Sept 30, 2022

According to one of the authors, the code and the models will be available soon [0]

[0] - https://twitter.com/FelixKreuk/status/1575846953333579776

uwagar · on Oct 1, 2022

s/textually/sexually

i giggled :)