Hacker News new | past | comments | ask | show | jobs | submit login

> GPU directly read from a SSD, bypassing all of the steps in between

Ehh... unless you have more specific knowledge that’s still under NDA, the reasonable assumption is that the CPU still mmap()es the file into the GPU’s address space, and then the CPU pages in data from the SSD as the GPU generates page misses. Which is technically possible on PCs today, but isn’t done because you can’t assume fast SSDs and you definitely can’t assume shared memory. (actually I’m pretty sure discrete GPUs can generate page misses for mapped CPU memory, but I’m not certain which graphics APIs really let that happen)




Better be at least 2 MB pages then. 5.5M (max. 22 GB/s compression output on PS5) page faults per second for 4 kB pages might otherwise ruin this scheme a bit...

2 MB page size would reduce this to "just" 11k page faults per second.


I know nothing about this, but wouldn't most textures be quite small, and with a large page size you might end up reading in much more than you need, making it often worse than useless?



That's 113 pages, does this really cover the issue of reading too much? IF so, I'll try and find the time.

I mean I'm not complaining about TLB issues which large pages do better at for obvious reasons, but thrashing the TLB is likely to be a lot cheaper than reading too much from the disk.


FreeBSD superpage max size on amd64 is 1GB

edit: and there have been SPARC CPUs with memory controllers that can do 256GB pages


the reasonable assumption is that the CPU still mmap()es the file into the GPU’s address space, and then the CPU pages in data from the SSD as the GPU generates page misses.

You seem to know quite a bit about this - just wondering - how does this work? The GPU can generate PCIe bus traffic to do RAM reads or writes, but how does it cause a page fault in the CPU? Is this some kind of IOMMU? Is there any place I can read more about this?


I don't think the GPU can generate page faults in the traditional sense (which core would #PF be delivered to?). The CPU has to pre-fetch data before the GPU tries to use it. On the PS5, the GPU may be able to issue read requests directly to the IO coprocessor and have it load data off the SSD, run it through the decompressor, place the data into the requested DRAM addresses, and invalidate the necessary cache lines in any GPU or CPU cores. But I'm not sure that the PS5 can actually do that with zero CPU involvement.


APU, shared memory controller, unified memory address space.


You do realize the GPU can just read it from the physical address space and not via the CPUs MMU. It's not an innovation. There is still a lot of level3 Bus transactions. If the SSD had two buses, with 1 directly to the GPU (read only type) there would be less pressure on the level3 bus and that would speed things up.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: