But isn't 3 generations ahead just 8x? Which doesn't sound at all unreasonable f...

Symmetry · on May 18, 2016

The rule of thumb I was taught was that going from a DSP/GPU to a custom ASIC would give you a 10X advantage in performance/power which is pretty close to this. And look at how much bitcoin mining ASICs out compete GPUs.

dnautics · on May 18, 2016

This is about right! 64-bit IEEE fp -> 16-bit IEEE-style fp[0] is a 4x bit size reduction, and multiplication is O(n^2) is silicon transistor count.

[0] If google is smart, they'd ditch +/- infinity and if they were ballsy, they'd ditch zero in their FP implementation.

Symmetry · on May 18, 2016

Generally speaking GPUs are already very good at running with float32s, usually much better than they are at using float64s in fact. The big advantages of using an ASIC are mostly on the storage side but they also allow you to get away with non IEEE floating point numbers that don't necessarily implement subnormals, NaN, etc.

kevinnk · on May 18, 2016

I doubt FP hardware size is the limiting factor in their implementation (it's not in GPUs and definitely not in high perf CPUs). More likely they came up with higher level architectural tricks that let them specialize for machine learning (i.e. taking better advantage of locality in the application, etc).

_r5wf · on May 18, 2016

Actually for many neural network apps 8, 4 even 1 bit is sufficient for representing weights.