You can see a massive improvement for shared hyperthreads, and on-CPU-socket messages.
In this case, about ~350 cycles local core, ~700 cycles remote core, ~90 cycles same hyperthread. Divide these by your clock rate as long as you're Skylake+ for the speed in seconds. Eg. about 87.5 nanoseconds for a 4 GHz processor for local core IPC.
https://raw.githubusercontent.com/MarginResearch/cannoli/mai...
You can see a massive improvement for shared hyperthreads, and on-CPU-socket messages.
In this case, about ~350 cycles local core, ~700 cycles remote core, ~90 cycles same hyperthread. Divide these by your clock rate as long as you're Skylake+ for the speed in seconds. Eg. about 87.5 nanoseconds for a 4 GHz processor for local core IPC.