It can be fun to explore the interactions of unorm and float bit representations even when you have float instructions. E.g. if you bit-or a unorm8 into 0x47000000 (32768.0f) then subtract 32768.0f, you'll get a number very close to right, just a float multiply of (256/255.0f) away. Reordering the math so that the subtraction and multiply can become a single FMA is a fun homework exercise.
union {
int bits;
float f;
} pun = {x}, scale = {0x47000000};
pun.bits |= scale.bits;
pun.f -= scale.f;
pun.f *= (256/255.0f);
This basically amounts to a software implementation of int->float conversion instructions; sadly I have never found a spot where it's actually worth doing when you have those int->float instructions available already, even with the FMA as a single instruction.
It's also worth considering whether your application can handle approximate conversion. If you have a [0,255] unorm in x, x + (x>>7) or equivalently x + (x>0x7f) will round it to a [0,256] fixed-point value. Crucially, this rounding does handle 0x00 and 0xff inputs correctly. Once in fixed-point with a nice power-of-two divisor, you can play all sorts of tricks, either again making use of the bit representation of floats, using ARM fixed-point instructions, etc. If you've ever looked longingly at the pmulhrsw family of instructions, this is a ripe area to explore.
For those who like me had no ideas what UNORM/SNORM are, it's a base-2 fixed-point representation apparently common in the latest GPU APIs (DirectX 10/11, Vulkan).
The gist is they store uniformly distributed values between 0..1 or -1..1, which can be more space efficient than floats if you know your data will always be in one of those ranges. The conversion to/from floats isn't too hard to do in software but it's common enough in graphics that GPUs usually have dedicated hardware support for it.
Also it's common implicitly in typical representation of images as inputs to neural nets.
For funsies I asked chatgpt 4o to try:
uint16_t product = x * 257; // Multiply by 257
return product >> 16; // Shift right by 16 bits
A valiant effort, and confidently presented. When pressed and given the methods from the article to compare with it admitted it may suffer from precision issues, but was ultimately the best of the three.
The Achilles heel of this approach (and of blindly trusting LLM code in general) is left as an exercise.
I was about to debate the “drum of oil” quip, but thinking about it: the o1 model can cost tens of dollars for a long back-and-forth conversation.
A barrel of oil is currently $70, so if you’ve spent more than that on your chat, you’ve gone through an “energy slave for a year” in a matter of minutes! (A barrel of oil contains the same energy as the work done by a human labourer over a year.)
Note: You might argue that most of the cost is due to the rental on the chips and not the electrical bill. Sure, but a silicon chip has a minuscule material cost and is mostly expensive because of the energy-intensive processes required to make it.
It's also worth considering whether your application can handle approximate conversion. If you have a [0,255] unorm in x, x + (x>>7) or equivalently x + (x>0x7f) will round it to a [0,256] fixed-point value. Crucially, this rounding does handle 0x00 and 0xff inputs correctly. Once in fixed-point with a nice power-of-two divisor, you can play all sorts of tricks, either again making use of the bit representation of floats, using ARM fixed-point instructions, etc. If you've ever looked longingly at the pmulhrsw family of instructions, this is a ripe area to explore.