This might be rendered moot by native microscaling support in Blackwell (MXFP). They've manually done a coarser-grained version of that for Hopper, but with full FP32 scaling factors.
These are very good and high profile public demonstrations of where $NVDA's moat is: that GPGPU is very flexible and you can program to do a lot of stuff that makes perfect sense but wasn't in the mind of hardware vendors.
Now, if you predict the future to eventually converge on more and more dedicated hardware support, to the point that there's no more software optimizations like these, then the so-called "CUDA moat" breaks.
To stay in this game, NVIDIA is breaking down their own moat :p