> GPU directly read from a SSD, bypassing all of the steps in between Ehh... unl...

vardump · on June 13, 2020

Better be at least 2 MB pages then. 5.5M (max. 22 GB/s compression output on PS5) page faults per second for 4 kB pages might otherwise ruin this scheme a bit...

2 MB page size would reduce this to "just" 11k page faults per second.

throwaway_pdp09 · on June 13, 2020

I know nothing about this, but wouldn't most textures be quite small, and with a large page size you might end up reading in much more than you need, making it often worse than useless?

67868018 · on June 13, 2020

Transparent superpages solve this

http://u.cs.biu.ac.il/~wiseman/2os/superpage/navaroo1.pdf

throwaway_pdp09 · on June 13, 2020

That's 113 pages, does this really cover the issue of reading too much? IF so, I'll try and find the time.

I mean I'm not complaining about TLB issues which large pages do better at for obvious reasons, but thrashing the TLB is likely to be a lot cheaper than reading too much from the disk.

67868018 · on June 13, 2020

FreeBSD superpage max size on amd64 is 1GB

edit: and there have been SPARC CPUs with memory controllers that can do 256GB pages

CountSessine · on June 13, 2020

the reasonable assumption is that the CPU still mmap()es the file into the GPU’s address space, and then the CPU pages in data from the SSD as the GPU generates page misses.

You seem to know quite a bit about this - just wondering - how does this work? The GPU can generate PCIe bus traffic to do RAM reads or writes, but how does it cause a page fault in the CPU? Is this some kind of IOMMU? Is there any place I can read more about this?

wtallis · on June 14, 2020

I don't think the GPU can generate page faults in the traditional sense (which core would #PF be delivered to?). The CPU has to pre-fetch data before the GPU tries to use it. On the PS5, the GPU may be able to issue read requests directly to the IO coprocessor and have it load data off the SSD, run it through the decompressor, place the data into the requested DRAM addresses, and invalidate the necessary cache lines in any GPU or CPU cores. But I'm not sure that the PS5 can actually do that with zero CPU involvement.

rasz · on June 13, 2020

APU, shared memory controller, unified memory address space.

DudeInBasement · on June 13, 2020

You do realize the GPU can just read it from the physical address space and not via the CPUs MMU. It's not an innovation. There is still a lot of level3 Bus transactions. If the SSD had two buses, with 1 directly to the GPU (read only type) there would be less pressure on the level3 bus and that would speed things up.