Hacker News new | past | comments | ask | show | jobs | submit login

Just as going from scalar to vector instructions provides a speedup so does going from vector to matrix instructions. If you've got big vectors than the amount of parallelism exposed for more hardware execution resources used on isn't too big but the reduction in register file read port usage is pretty significant.

Also, inference is usually happy with int8s whereas graphics workloads are mostly float32s. So you can save a lot of hardware that way too.




Why are graphics workloads float32? 32bit (1million+alpha) which is higher color resolution than most eyes can see - "true color" - is 3 8-bit ints + an 8 bit alpha channel (sometimes)


GPUs are not (only) about representing pixels, they are (mostly) about geometric computation.


Because before you can see a color, you have to compute it. For example, you need to calculate what color would result from the interaction of a light source of a given color / intensity and a surface of a given color / reflectivity / glossiness etc. There's no way to reasonably compute that using just 8-bit ints without getting terrible banding / quantization artifacts.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: