The original WaveNet repeated a lot of computations; with caching/dynamic programming, it became a lot faster. Other optimizations were also doable. In any case, that was eventually made moot by using model distillation to train a wide flat (not deep) NN, which is 20x realtime: https://deepmind.com/blog/high-fidelity-speech-synthesis-wav... (This was necessary to make it cost-effective to deploy onto Google Assistant.)