Hacker News new | past | comments | ask | show | jobs | submit login

My reading of the generator diagram (figure 6) isn't that it is generating waveforms, but that it is generating phoneme probabilities.

You can train a similar system to produce audio on the output of wav2vec, though it probably won't sound similar to the input audio (accent/voice) unless you expose more features of the input than phonemes.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: