Here's the mempipe benchmark latency for core<->core: https://github.com/MarginR...

Here's the mempipe benchmark latency for core<->core: https://github.com/MarginResearch/cannoli/blob/main/mempipe/...

https://raw.githubusercontent.com/MarginResearch/cannoli/mai...

You can see a massive improvement for shared hyperthreads, and on-CPU-socket messages.

In this case, about ~350 cycles local core, ~700 cycles remote core, ~90 cycles same hyperthread. Divide these by your clock rate as long as you're Skylake+ for the speed in seconds. Eg. about 87.5 nanoseconds for a 4 GHz processor for local core IPC.