What is the capabilities that a piece of hardware like this needs to have to be suitable for machine learning (and not just one specific machine learning problem)?
It is interesting how malleable are neural networks.
- you can drop half the connections and it still works, in fact it works even better, during training
- you can represent the weights on as little as one bit, but still use real numbers for computing activations
- you can insert layers and extend the network
- you can "distill" a network into a smaller, almost as efficient network or an ensemble of heavy networks into a single one with higher accuracy
- you can add a fixed weights random layer and sometimes it works even better
- you can enforce sparsity of activations and then precompute a hash function to only activate those neurons that will respond to the input signal, thus making the network much faster
It seems the neural network is a malleable entity with great potential for making it faster on the algorithmic side. They got 10x speedup mainly on exploiting a few of these ideas, instead of making the hardware 10x faster. Otherwise, they wouldn't have made it the size of a HDD - because they would need much more ventilation in order to dissipate the heat. It's just a specialized hardware taking advantage of the latest algorithmic optimizations.
Sacrifice generality, accuracy and ability to randomly access a lot of memory so that you can implement fast and power-efficient matrix operations with a single, low accuracy datatype thus requiring less memory, bandwidth and transistors.