This paper feels behind by a few years, I think. But bfloat16 and fp16 are both ...

This paper feels behind by a few years, I think. But bfloat16 and fp16 are both natively supported in hardware.

We're down to fp8 now with NVIDIA's latest hardware. This conversation is wayyyyy back from where it is in a few other places. FP8 even shouldn't be a huge issue (at least for mixed at first), it's things like the 4-bit datatypes and such where things really and truly get spicy IMO.