> or 1 channel (128 bits wide) The DDR4 bus is 64-bit, how can you have a 128-bi...

sliken · on Nov 25, 2020

> The DDR4 bus is 64-bit, how can you have a 128-bit channel??

Less familiar with the normal on laptops, but most desktop chips from AMD and Intel have two 64 bit channels.

> Which might have downsides of its own.

Typically for each channel you send an address, (a row and column actually), wait for the dram latency, and then get a burst of transfers (one per bus cycle) of the result. So for a 16 bit wide channel @ 3.2 Ghz with a 128 byte cache line you get 64 transfers, one ever 0.3125 ns for a total of 20ns.

Each channel operates independently, so multiple channels can each have a cache miss in flight. Otherwise nobody would bother with independent channels and just stripe them all together.

Here's a graph of cache line throughput vs number of threads.

https://github.com/spikebike/pstream/blob/master/png/apple-m...

So with 1,2 you see an increase in throughput, the multiple channels are helping. 4 threads is the same as two, maybe the L2 cache has a bottleneck. But 8 threads is clearly better than 4.

floatboth · on Nov 25, 2020

> two 64 bit channels

Yeah, I'm saying you can't magically unify them into a single 128-bit one. If you only use a single channel, the other one is unused.

sliken · on Nov 25, 2020

It's pretty common for hardware to support both. On the Zen1 Epyc's for instance some software preferred a consistent latency from stripped memory over the NUMA aware latency with separate channels where the closer dimms have lower latency and the further dimms had higher.

I've seen similar on Intel servers, but not recently. This isn't however typically something you can do at runtime, just boottime, at least as far as I've seen.