C compilers also have decades of experience optimally translating C code into ma...

neonsunset · 2024-03-03T09:08:20 1709456900

That's not necessarily true, on JIT vs AOT split. I'm mostly going off of how the divergence in available optimization is starting to look like in .NET after introduction of Native AOT with light research into LLVM and various optimization-adjacent Rust crates.

In particular, with JIT, you are able to initialize certain readonly data once, and then, on recompilation to a more optimized version, bake such data as JIT constants right into emitted machine code. This is not possible with AOT. Same applies for all kinds of in-runtime profiling/analysis and recompilation to incorporate a collected profile according to this exact run of an application. JIT also offers the ability to load modules dynamically in the form of bytecode without having to have a strict machine-level ABI, only the bytecode one, which allows for efficient generics that cross modules, as well as cross-module function inlining. And last but not least - there is no need to pick the least common denominator in supported hardware features as the code can be compiled to use the latest features provided by hardware like AVX512.

On the other hand, pure AOT means a frozen world which allows the compiler to know exact types and paths the code can take, performing exact devirtualization and much more aggressive preinitialization on code that accepts constant data. It also means bigger leeway in the time the compiler can spend on optimizing code. Historically, GCC and LLVM have been more advanced than their JIT counterparts because of different tradeoffs more favouring to absolute performance of the emitted code as well as simply higher amount of man hours invested in developing them (e.g. .NET punches above it's weight class despite being worked on by a smaller team vs OpenJDK or LLVM).

pjmlp · 2024-03-03T08:42:18 1709455338

C compilers only have one opportunity to do that once, at compile time, if the developer was lucky with their data set used to train the PGO output, maybe the outcome is greatly improved.

Modern JVMs, not only have the JIT being able to use actual production data, they are able to cache PGO data between execution runs, and reach an optimimal set of heuristics throughout execution time.

And on Android, those PGO files are even shared between devices via Play Store.