This paper feels behind by a few years, I think. But bfloat16 and fp16 are both natively supported in hardware.
We're down to fp8 now with NVIDIA's latest hardware. This conversation is wayyyyy back from where it is in a few other places. FP8 even shouldn't be a huge issue (at least for mixed at first), it's things like the 4-bit datatypes and such where things really and truly get spicy IMO.
We're down to fp8 now with NVIDIA's latest hardware. This conversation is wayyyyy back from where it is in a few other places. FP8 even shouldn't be a huge issue (at least for mixed at first), it's things like the 4-bit datatypes and such where things really and truly get spicy IMO.