Unorm and SNORM to float, hardware edition

mtklein · 2024-12-26T20:05:09 1735243509

It can be fun to explore the interactions of unorm and float bit representations even when you have float instructions. E.g. if you bit-or a unorm8 into 0x47000000 (32768.0f) then subtract 32768.0f, you'll get a number very close to right, just a float multiply of (256/255.0f) away. Reordering the math so that the subtraction and multiply can become a single FMA is a fun homework exercise.

        union {
            int bits;
            float  f;
        } pun = {x}, scale = {0x47000000};
        pun.bits |= scale.bits;
        pun.f    -= scale.f;
        pun.f    *= (256/255.0f);

This basically amounts to a software implementation of int->float conversion instructions; sadly I have never found a spot where it's actually worth doing when you have those int->float instructions available already, even with the FMA as a single instruction.

It's also worth considering whether your application can handle approximate conversion. If you have a [0,255] unorm in x, x + (x>>7) or equivalently x + (x>0x7f) will round it to a [0,256] fixed-point value. Crucially, this rounding does handle 0x00 and 0xff inputs correctly. Once in fixed-point with a nice power-of-two divisor, you can play all sorts of tricks, either again making use of the bit representation of floats, using ARM fixed-point instructions, etc. If you've ever looked longingly at the pmulhrsw family of instructions, this is a ripe area to explore.

mgaunard · 2024-12-26T15:11:23 1735225883

For those who like me had no ideas what UNORM/SNORM are, it's a base-2 fixed-point representation apparently common in the latest GPU APIs (DirectX 10/11, Vulkan).

jsheard · 2024-12-26T15:59:24 1735228764

The gist is they store uniformly distributed values between 0..1 or -1..1, which can be more space efficient than floats if you know your data will always be in one of those ranges. The conversion to/from floats isn't too hard to do in software but it's common enough in graphics that GPUs usually have dedicated hardware support for it.

Y_Y · 2024-12-26T15:56:33 1735228593

Also it's common implicitly in typical representation of images as inputs to neural nets.

For funsies I asked chatgpt 4o to try:

    uint16_t product = x * 257; // Multiply by 257
    return product >> 16;       // Shift right by 16 bits

A valiant effort, and confidently presented. When pressed and given the methods from the article to compare with it admitted it may suffer from precision issues, but was ultimately the best of the three.

The Achilles heel of this approach (and of blindly trusting LLM code in general) is left as an exercise.

Y_Y · 2024-12-26T16:23:56 1735230236

Ok, with o1 and some careful prompting I got the following, which seems equivalent to the first approach from the article:

    float uint8_div_255(uint8_t x) {    
        uint32_t y = x * 0x010101u;  // x * 65793 
        return ((float)y) * (1.0f / 16777215.0f);
}

andrepd · 2024-12-27T20:05:50 1735329950

Spending a drum of oil's worth of energy and knowing the answer beforehand, you too can get an LLM to print you a plausible but wrong snippet of code.

jiggawatts · 2024-12-27T21:56:06 1735336566

I was about to debate the “drum of oil” quip, but thinking about it: the o1 model can cost tens of dollars for a long back-and-forth conversation.

A barrel of oil is currently $70, so if you’ve spent more than that on your chat, you’ve gone through an “energy slave for a year” in a matter of minutes! (A barrel of oil contains the same energy as the work done by a human labourer over a year.)

Note: You might argue that most of the cost is due to the rental on the chips and not the electrical bill. Sure, but a silicon chip has a minuscule material cost and is mostly expensive because of the energy-intensive processes required to make it.

edflsafoiewq · 2024-12-26T17:55:35 1735235735

The most common place you'd know them from is probably colors. A 0-255 RGB value represents a 0.0-1.0 real number.

colejohnson66 · 2024-12-28T12:34:19 1735389259

Nit: RGB tends to have a gamma curve (sans linear RGB)