Hacker News new | past | comments | ask | show | jobs | submit login

But isn't 3 generations ahead just 8x? Which doesn't sound at all unreasonable for a custom hardware.



The rule of thumb I was taught was that going from a DSP/GPU to a custom ASIC would give you a 10X advantage in performance/power which is pretty close to this. And look at how much bitcoin mining ASICs out compete GPUs.


This is about right! 64-bit IEEE fp -> 16-bit IEEE-style fp[0] is a 4x bit size reduction, and multiplication is O(n^2) is silicon transistor count.

[0] If google is smart, they'd ditch +/- infinity and if they were ballsy, they'd ditch zero in their FP implementation.


Generally speaking GPUs are already very good at running with float32s, usually much better than they are at using float64s in fact. The big advantages of using an ASIC are mostly on the storage side but they also allow you to get away with non IEEE floating point numbers that don't necessarily implement subnormals, NaN, etc.


I doubt FP hardware size is the limiting factor in their implementation (it's not in GPUs and definitely not in high perf CPUs). More likely they came up with higher level architectural tricks that let them specialize for machine learning (i.e. taking better advantage of locality in the application, etc).


Actually for many neural network apps 8, 4 even 1 bit is sufficient for representing weights.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: