Funnily coincident timing: current post #2 (GCC 11.1 released) adds support for ...

floatboth · on April 27, 2021

These tunings will only be used if you compile stuff yourself with -march=native (or specifying one particular model). Most software out there would be compiled with generic non-tuned optimizations. The tuning is rarely a huge deal though.

eqvinox · on April 27, 2021

True, but it's still relevant for 3 things:

- when you have a particularly CPU-intensive application, you'd hopefully compile it to target your system

- the cloud providers can just do a custom Debian/Ubuntu/... build for their zillions of identical systems

- the library loading mechanism on Linux is slowly getting support for having multiple compile variants of a library packaged into different subdirectories of /lib (e.g. "/usr/lib64/tls/haswell/x86_64")

Also I was mostly trying to point out as a positive how well the interaction is working there between ARM and the GCC project. I wish it were like this for other types of silicon.

(CPU vendors all seem to be getting this right, and GPUs are slowly getting there, but much other silicon is horrible… e.g. wifi chips)

sitkack · on April 27, 2021

That is not entirely true. Binaries in the packaging systems might not be compiled for the most recent atomic instructions which can really affect performance.

https://blog.dbi-services.com/aws-postgresql-on-graviton2-aa...

https://github.com/microsoft/STL/issues/488

We are about 9-14 months away from the right pieces making their way through the software ecosystems where this will be almost a non-issue.

Exciting times for everyone!

floatboth · on April 28, 2021

Well, that – yeah. But it doesn't strictly have anything to do with the actual CPU model specific tuning that the news was about, only in that setting a specific CPU in -march (-mtune would not do it!) would imply the features. Typically though you'd just do -march=armv8-a+the+desired+features for that like the first post you linked does.

Really the important piece for making distribution binaries not suck is ifuncs/multiversioning. But library and app authors currently are required to deliberately use them. Which is fine for manual optimizations that use intrinsics or assembly (and e.g. standard library atomics) but I'm not sure any compiler currently would automatically just do that for autovectorization.