My reading of the generator diagram (figure 6) isn't that it is generating wavef...

lunixbochs on July 3, 2021 | parent | context | favorite | on: Wav2vec Overview: Semi and Unsupervised Speech Rec...

My reading of the generator diagram (figure 6) isn't that it is generating waveforms, but that it is generating phoneme probabilities.

You can train a similar system to produce audio on the output of wav2vec, though it probably won't sound similar to the input audio (accent/voice) unless you expose more features of the input than phonemes.