It is interesting how malleable are neural networks.
- you can drop half the connections and it still works, in fact it works even better, during training
- you can represent the weights on as little as one bit, but still use real numbers for computing activations
- you can insert layers and extend the network
- you can "distill" a network into a smaller, almost as efficient network or an ensemble of heavy networks into a single one with higher accuracy
- you can add a fixed weights random layer and sometimes it works even better
- you can enforce sparsity of activations and then precompute a hash function to only activate those neurons that will respond to the input signal, thus making the network much faster
It seems the neural network is a malleable entity with great potential for making it faster on the algorithmic side. They got 10x speedup mainly on exploiting a few of these ideas, instead of making the hardware 10x faster. Otherwise, they wouldn't have made it the size of a HDD - because they would need much more ventilation in order to dissipate the heat. It's just a specialized hardware taking advantage of the latest algorithmic optimizations.