AFAIK 16-bit "half-precision" floating point.

HappyTypist · on May 18, 2016

8 bit is enough and I suspect it's what the TPU is using: https://petewarden.com/2016/05/03/how-to-quantize-neural-net...

visarga · on May 19, 2016

It is interesting how malleable are neural networks.

- you can drop half the connections and it still works, in fact it works even better, during training

- you can represent the weights on as little as one bit, but still use real numbers for computing activations

- you can insert layers and extend the network

- you can "distill" a network into a smaller, almost as efficient network or an ensemble of heavy networks into a single one with higher accuracy

- you can add a fixed weights random layer and sometimes it works even better

- you can enforce sparsity of activations and then precompute a hash function to only activate those neurons that will respond to the input signal, thus making the network much faster

It seems the neural network is a malleable entity with great potential for making it faster on the algorithmic side. They got 10x speedup mainly on exploiting a few of these ideas, instead of making the hardware 10x faster. Otherwise, they wouldn't have made it the size of a HDD - because they would need much more ventilation in order to dissipate the heat. It's just a specialized hardware taking advantage of the latest algorithmic optimizations.