What is the capabilities that a piece of hardware like this needs to have to be ...

wmf · on May 18, 2016

AFAIK 16-bit "half-precision" floating point.

HappyTypist · on May 18, 2016

8 bit is enough and I suspect it's what the TPU is using: https://petewarden.com/2016/05/03/how-to-quantize-neural-net...

visarga · on May 19, 2016

It is interesting how malleable are neural networks.

- you can drop half the connections and it still works, in fact it works even better, during training

- you can represent the weights on as little as one bit, but still use real numbers for computing activations

- you can insert layers and extend the network

- you can "distill" a network into a smaller, almost as efficient network or an ensemble of heavy networks into a single one with higher accuracy

- you can add a fixed weights random layer and sometimes it works even better

- you can enforce sparsity of activations and then precompute a hash function to only activate those neurons that will respond to the input signal, thus making the network much faster

It seems the neural network is a malleable entity with great potential for making it faster on the algorithmic side. They got 10x speedup mainly on exploiting a few of these ideas, instead of making the hardware 10x faster. Otherwise, they wouldn't have made it the size of a HDD - because they would need much more ventilation in order to dissipate the heat. It's just a specialized hardware taking advantage of the latest algorithmic optimizations.

PeterisP · on May 19, 2016

Sacrifice generality, accuracy and ability to randomly access a lot of memory so that you can implement fast and power-efficient matrix operations with a single, low accuracy datatype thus requiring less memory, bandwidth and transistors.