> However, the details of modern branch predictors are proprietary, so we don’t have authoritative sources on them.
I focused on Computer Architecture for a masters degree and now I work on a CPU design team. While I cannot say what we use due to NDA, I will say that it is not proprietary. Very nearly everything, including the branch predictors, in modern CPUs can be found in academic research.
Many of these secrets are easily found in the reading list for a graduate-level computer architecture course. Implementation details vary but usually not by too much.
I’m not related to academia. I don’t design CPUs. I don’t write operating systems and I don’t care about these side channel attacks. I simply write user-mode software, and I want my code to be fast.
The academic research used or written by CPU designers being public doesn’t help me, because I only care about the implementation details of modern CPUs like Intel Skylake and newer, AMD Zen 2 and newer. These details have non-trivial performance consequences for branchy code, but they vary a lot between different processors. For example, AMD even mentions neural networks in the press release: https://www.amd.com/en/technologies/sense-mi
What the GP is saying is that all the details of how modern processors work are out there in books and academic papers, and that the material covered in graduate-level computer architecture courses is very relevant and helpful, and they include all (or nearly all) the techniques used in industry.
From the GP's perspective, it doesn't matter at all if the course taught branch predictors on a MIPS processor, even though MIPS isn't really used anywhere anymore (well, that's wrong, they're used extensively in networking gear, but y'know, for the argument). They still go over the various techniques used, their consequences, etc., so the processor chosen as an example is unimportant.
You're saying that all this information is unhelpful for you, because what you want is a detailed optimization guide for a particular CPU with its own particular implementation of branch prediction. And yeah, university courses don't cover that, but note that they're not "outdated" because it's not as if at some point what they taught was "current" in this respect.
So yeah, in this sense you're right, academia does not directly tackle optimization for a given processor in teaching or research, and if it did it would be basically instantly outdated. Your best resource for doing that is the manufacturer's optimization guide, and those can be light on details, especially on exactly how the branch predictor works.
But "how a processor works" is a different topic from "how this specific processor works", and the work being done in academia is not outdated compared to what the industry is doing.
PS: Never believe the marketing in the press release, yeah? "Neural network" as used here is pure marketing bullshit. They're usually not directly lying, but you can bet that they're stretching the definition of what a "neural network" is and the role it plays.
> They still go over the various techniques used, their consequences, etc., so the processor chosen as an example is unimportant.
They also include various techniques not used anymore, without mentioning that’s the case. I did a search for “branch predictor static forward not taken site:.edu” and found many documents which discuss that particular BTFN technique. In modern CPUs the predictor works before fetch or decode.
> university courses don't cover that
Here’s a link to one: https://course.ece.cmu.edu/~ece740/f15/lib/exe/fetch.php?med... According to the first slide, the document was written in fall 2015. It has dedicated slides discussing particular implementations of branch predictors in Pentium Pro, Alpha 21264, Pentium M, and Pentium 4.
The processors being covered were released between 1995 and 2003. At the time that course was written, people were already programming Skylake and Excavator, and Zen 1 was just around the corner.
I’m not saying the professor failed to deliver. Quite the opposite, information about old CPUs is better than pure theory without any practically useful stuff. Still, I’m pretty sure they would be happy to included slides about contemporary CPUs, if only that information was public.
> They also include various techniques not used anymore, without mentioning that’s the case.
Definitely. Sometimes it's for comparative reasons, and sometimes it's easier to understand the newer technique in the context of the older one.
> discussing particular implementations of branch predictors in Pentium Pro, Alpha 21264, Pentium M, and Pentium 4.
Yeah, but the course is still not the optimization guide you wanted. The slides pick & choose features from each branch predictor to make the point the professor wanted to make and present the idea he wanted to. It's not really useful for optimizing code for that particular processor, it's useful for understanding how branch predictors work in general.
> I’m pretty sure they would be happy to included slides about contemporary CPUs, if only that information was public.
Only if they served as a good example for some concept, or helped make a point that the professor wanted to make. There's no point in changing the examples to a newer processor if the old one is a cleaner implementation of the concept being discussed (and older examples tend to be simpler and therefore cleaner). The point isn't to supply information about specific processors, it's to teach the techniques used in branch predictors.
P.S. See those 3 slides about a "Perceptron Branch Predictor"? Based on a paper from 2001? I'm betting AMD's "neural network" is really just something like that...
Practically, the only thing that matters is that branch prediction assumes that history repeats itself, and that past patterns of a branch being taken in certain conditions will impact it being taken again.
So that means that conditions that are deterministic and relatively constant throughout the lifetime of the program will most likely be predicted correctly, and that rare events will most likely not be predicted correctly. That's all you need to know to write reasonably optimized code.
I focused on Computer Architecture for a masters degree and now I work on a CPU design team. While I cannot say what we use due to NDA, I will say that it is not proprietary. Very nearly everything, including the branch predictors, in modern CPUs can be found in academic research.
Many of these secrets are easily found in the reading list for a graduate-level computer architecture course. Implementation details vary but usually not by too much.