Well the claim is also that it *isn't* generally an issue for the M1's architect...

vvanders · on April 14, 2022

The claim is that if you don't "optimize" for TBDR then you are gated by the TLB and that software written for immediate mode rendering isn't optimized this way.

However drawcall batching, minimizing render target(and state!) switching are all bread-and-butter performance optimizations you'd make for an immediate mode renderer as well. You can get away with more drawcalls and there are specific things you can do on a per-architecture basis but they don't involve anything having to do with TLBs to the best of my knowledge. If someone wants to spill the beans on how the M1 is somewhat different here I'm all ears but it just doesn't line up with most GPU architectures I've seen.

The big wins between TBR and IMR usually involve getting larger tiles through less usage of rendertargets[1] as it increases the total size of the tiles which drives efficiencies in dispatching the tiles.

[1] https://developer.qualcomm.com/sites/default/files/docs/adre...