I haven't done the research, but I can't believe most of the information could b...

Const-me · on Aug 9, 2023

I share the OP’s opinion that a lot of available information is incorrect.

It seems the industry is moving faster than the academics who write books and university courses can update these sources. Big endian CPUs, CPU architectures other than AMD64 and ARM, x87 FPU, are examples of the topics which are no longer relevant. However, these topics are well covered because they were still relevant couple decades ago when people wrote these sources.

Some details of modern hardware are secret. An example from low-level programming, many sources claim CPUs have two kinds of branch predictors, static one which predicts forward branches as not taken, and the dynamic one which queries/updates these BTB entries. This is incorrect because mainstream CPUs made in the last 15 years no longer have the static one. However, the details of modern branch predictors are proprietary, so we don’t have authoritative sources on them. We only have speculations based on some micro-benchmarks.

zerohp · on Aug 9, 2023

> However, the details of modern branch predictors are proprietary, so we don’t have authoritative sources on them.

I focused on Computer Architecture for a masters degree and now I work on a CPU design team. While I cannot say what we use due to NDA, I will say that it is not proprietary. Very nearly everything, including the branch predictors, in modern CPUs can be found in academic research.

Many of these secrets are easily found in the reading list for a graduate-level computer architecture course. Implementation details vary but usually not by too much.

Const-me · on Aug 9, 2023

I’m not related to academia. I don’t design CPUs. I don’t write operating systems and I don’t care about these side channel attacks. I simply write user-mode software, and I want my code to be fast.

The academic research used or written by CPU designers being public doesn’t help me, because I only care about the implementation details of modern CPUs like Intel Skylake and newer, AMD Zen 2 and newer. These details have non-trivial performance consequences for branchy code, but they vary a lot between different processors. For example, AMD even mentions neural networks in the press release: https://www.amd.com/en/technologies/sense-mi

chaosite · on Aug 9, 2023

You're both right.

What the GP is saying is that all the details of how modern processors work are out there in books and academic papers, and that the material covered in graduate-level computer architecture courses is very relevant and helpful, and they include all (or nearly all) the techniques used in industry.

From the GP's perspective, it doesn't matter at all if the course taught branch predictors on a MIPS processor, even though MIPS isn't really used anywhere anymore (well, that's wrong, they're used extensively in networking gear, but y'know, for the argument). They still go over the various techniques used, their consequences, etc., so the processor chosen as an example is unimportant.

You're saying that all this information is unhelpful for you, because what you want is a detailed optimization guide for a particular CPU with its own particular implementation of branch prediction. And yeah, university courses don't cover that, but note that they're not "outdated" because it's not as if at some point what they taught was "current" in this respect.

So yeah, in this sense you're right, academia does not directly tackle optimization for a given processor in teaching or research, and if it did it would be basically instantly outdated. Your best resource for doing that is the manufacturer's optimization guide, and those can be light on details, especially on exactly how the branch predictor works.

But "how a processor works" is a different topic from "how this specific processor works", and the work being done in academia is not outdated compared to what the industry is doing.

PS: Never believe the marketing in the press release, yeah? "Neural network" as used here is pure marketing bullshit. They're usually not directly lying, but you can bet that they're stretching the definition of what a "neural network" is and the role it plays.

Const-me · on Aug 9, 2023

> They still go over the various techniques used, their consequences, etc., so the processor chosen as an example is unimportant.

They also include various techniques not used anymore, without mentioning that’s the case. I did a search for “branch predictor static forward not taken site:.edu” and found many documents which discuss that particular BTFN technique. In modern CPUs the predictor works before fetch or decode.

> university courses don't cover that

Here’s a link to one: https://course.ece.cmu.edu/~ece740/f15/lib/exe/fetch.php?med... According to the first slide, the document was written in fall 2015. It has dedicated slides discussing particular implementations of branch predictors in Pentium Pro, Alpha 21264, Pentium M, and Pentium 4.

The processors being covered were released between 1995 and 2003. At the time that course was written, people were already programming Skylake and Excavator, and Zen 1 was just around the corner.

I’m not saying the professor failed to deliver. Quite the opposite, information about old CPUs is better than pure theory without any practically useful stuff. Still, I’m pretty sure they would be happy to included slides about contemporary CPUs, if only that information was public.

chaosite · on Aug 9, 2023

> They also include various techniques not used anymore, without mentioning that’s the case.

Definitely. Sometimes it's for comparative reasons, and sometimes it's easier to understand the newer technique in the context of the older one.

> discussing particular implementations of branch predictors in Pentium Pro, Alpha 21264, Pentium M, and Pentium 4.

Yeah, but the course is still not the optimization guide you wanted. The slides pick & choose features from each branch predictor to make the point the professor wanted to make and present the idea he wanted to. It's not really useful for optimizing code for that particular processor, it's useful for understanding how branch predictors work in general.

> I’m pretty sure they would be happy to included slides about contemporary CPUs, if only that information was public.

Only if they served as a good example for some concept, or helped make a point that the professor wanted to make. There's no point in changing the examples to a newer processor if the old one is a cleaner implementation of the concept being discussed (and older examples tend to be simpler and therefore cleaner). The point isn't to supply information about specific processors, it's to teach the techniques used in branch predictors.

P.S. See those 3 slides about a "Perceptron Branch Predictor"? Based on a paper from 2001? I'm betting AMD's "neural network" is really just something like that...

mgaunard · on Aug 10, 2023

"Neural networks" just mean perceptrons.

Practically, the only thing that matters is that branch prediction assumes that history repeats itself, and that past patterns of a branch being taken in certain conditions will impact it being taken again.

So that means that conditions that are deterministic and relatively constant throughout the lifetime of the program will most likely be predicted correctly, and that rare events will most likely not be predicted correctly. That's all you need to know to write reasonably optimized code.

dist1ll · on Aug 9, 2023

> CPU architectures other than AMD64 and ARM [..] no longer relevant

cough RISC-V cough

archmaster · on Aug 9, 2023

"Wrong" is perhaps not the most accurate word. I most often found information to be either extremely oversimplified such as to be unhelpful, or outdated and no longer relevant for current systems. Although, yes, some things were just wrong.

There are courses and presentations and books, but there aren't many websites or articles — and that's the learning style that works best for me. Undergrad programs will teach a lot of what I covered (though certainly not all, and it really depends on the program) but I believe that knowledge should not be gatekept on going to college.

assbuttbuttass · on Aug 9, 2023

Ultimately, diving deeper with only websites and articles can be quite challenging. I experienced this myself trying to learn more about the continuation passing style transformation in a compiler. No online websites or articles discussed the topic with any kind of depth.

Ultimately I read the classic book "Compiling with Continuations", and it basically cleared up all my confusions.

All of this is to say, don't discount books and courses. They will almost always be more in depth and correct than what you will find written up on a website.

archmaster · on Aug 9, 2023

I think you are very correct, and I don't like it. There should be more "online books" that are in depth and correct!

vmladenov · on Aug 9, 2023

Have a look at this one! https://github.com/angrave/SystemProgramming/wiki

It was still in development when I went, looks like they made a PDF now. https://github.com/illinois-cs241/coursebook

cinntaile · on Aug 9, 2023

The course was changed from cs241 to cs341 so I think the most up to date version is here [0] now.

[0] https://cs341.cs.illinois.edu/coursebook/index.html

assbuttbuttass · on Aug 9, 2023

Agreed!