I agree with most of what you say but > why would a GPU have it's own TLB in the...

ribit · on April 14, 2022

Thanks for this! Can you elaborate a bit more and maybe give me some pointers where I can read about these things in more depth? I of course know about the MMU on traditional dGPUs but I was not aware that “integrated” GPUs also have a separate MMU. How does it work and why is it necessary given that the GPU shares the last level cache and the memory controller with the rest of the system?

monocasa · on April 14, 2022

Because the TLBs and other MMU hardware aren't in the last level cache or memory controller on the vast majority of systems, they normally sit about at L1 on each core. This is because you want at least L2 to be speaking completely in physical addresses so the coherency protocol isn't confused by the same page mapped in different ways. L1 is more complex typically being virtually indexed, but physically tagged. So when a access is issued with the virtual address, in parallel to the cache set look up, the TLB translates to a physical address, and then the physical address is compared on the tags of the resulting cache line ways in the set that matched the virtual address in order to select the actual cache line for the op.