Funnily coincident timing: current post #2 (GCC 11.1 released) adds support for the CPUs mentioned here (currently post #4):
AArch64 & arm
A number of new CPUs are supported through arguments to the -mcpu and -mtune options in both the arm and aarch64 backends (GCC identifiers in parentheses):
Arm Cortex-A78 (cortex-a78).
Arm Cortex-A78AE (cortex-a78ae).
Arm Cortex-A78C (cortex-a78c).
Arm Cortex-X1 (cortex-x1).
Arm Neoverse V1 (neoverse-v1).
Arm Neoverse N2 (neoverse-n2).
Good to see work going into this at the proper times. (Not that that was much of a problem for CPU cores in recent times. Still not a matter of course though.)
These tunings will only be used if you compile stuff yourself with -march=native (or specifying one particular model). Most software out there would be compiled with generic non-tuned optimizations. The tuning is rarely a huge deal though.
- when you have a particularly CPU-intensive application, you'd hopefully compile it to target your system
- the cloud providers can just do a custom Debian/Ubuntu/... build for their zillions of identical systems
- the library loading mechanism on Linux is slowly getting support for having multiple compile variants of a library packaged into different subdirectories of /lib (e.g. "/usr/lib64/tls/haswell/x86_64")
Also I was mostly trying to point out as a positive how well the interaction is working there between ARM and the GCC project. I wish it were like this for other types of silicon.
(CPU vendors all seem to be getting this right, and GPUs are slowly getting there, but much other silicon is horrible… e.g. wifi chips)
That is not entirely true. Binaries in the packaging systems might not be compiled for the most recent atomic instructions which can really affect performance.
Well, that – yeah. But it doesn't strictly have anything to do with the actual CPU model specific tuning that the news was about, only in that setting a specific CPU in -march (-mtune would not do it!) would imply the features. Typically though you'd just do -march=armv8-a+the+desired+features for that like the first post you linked does.
Really the important piece for making distribution binaries not suck is ifuncs/multiversioning. But library and app authors currently are required to deliberately use them. Which is fine for manual optimizations that use intrinsics or assembly (and e.g. standard library atomics) but I'm not sure any compiler currently would automatically just do that for autovectorization.