Modern Out-of-order CPUs (like the M1), they can't see branches until far too la...

aengelke · 2024-07-13T07:20:14 1720855214

Side note: Intel CPUs since Skylake and also recent AMD CPUs (since Zen 3 or so?) store a history for indirect branches. On such processors, using threaded jumps does not really improve performance anymore (I've even seen 1-2% slowdowns on some cores).

phire · 2024-07-13T09:44:51 1720863891

Pretty sure it's Haswell and Zen 2. They both implement IT-TAGE based branch predictors.

I just assumed the M1 branch predictor would also be in the same class, but I guess not. In another comment (https://news.ycombinator.com/item?id=40952404), I did some tests to confirm that it was actually the threaded jumps responsible for the speedup.

I'm tempted to dig deeper, see what the M1's branch predator can and can't do.

phire · 2024-07-13T12:57:03 1720875423

too late to edit

Turns out that M1 can track the history of indirect branches just fine, but it takes 3 cycles for a correct prediction. With threaded jumps, the M1 gets a slightly higher hit rate for the initial 1 cycle prediction.

https://news.ycombinator.com/item?id=40953764