gcc and clang at least have options so you can optimize for specific CPUs. I'm not sure how good they are (most people want a generic optimization that runs well on all CPUs of the family, so there likely is lots of room for improvement with CPU specific optimization), but they can do that. This does (or at least can, again it probably isn't fully implemented), account for instruction length, pipeline depth, cache size.
The Javascript V8 engine, and the JVM both are popular and supported enough that I expect the teams working on them take advantage of every trick they can for specific CPUs, they have a lot of resources for this. (at least the major x86 and ARM chips - maybe they don't for MIPS or some uncommon variant of ARM...). Of courses there are other JIT engines, some uncommon ones don't have many resources and won't do this.
> take advantage of every trick they can for specific CPUs
Not to the extent clang and gcc do, no. V8 does, e.g. use AVX instructions and some others if they are indicated to be available by CPUID. TurboFan does global scheduling in moving out of the sea of nodes, but that is not machine-specific. There was an experimental local instruction scheduler for TurboFan but it never really helped big cores, while measurements showed it would have helped smaller cores. It didn't actually calculate latencies; it just used a greedy heuristic. I am not sure if it was ever turned on. TurboFan doesn't do software pipelining or unroll/jam, though it does loop peeling, which isn't CPU-specific.
> gcc and clang at least have options so you can optimize for specific CPUs. I'm not sure how good they are
They are not very good at it, and can't be. You can look inside them and see the models are pretty simple; the best you can do is optimize for the first step (decoder) of the CPU and avoid instructions called out in the optimization manual as being especially slow. But on an OoO CPU there's not much else you can do ahead of time, since branches and memory accesses are unpredictable and much slower than in-CPU resource stalls.
The Javascript V8 engine, and the JVM both are popular and supported enough that I expect the teams working on them take advantage of every trick they can for specific CPUs, they have a lot of resources for this. (at least the major x86 and ARM chips - maybe they don't for MIPS or some uncommon variant of ARM...). Of courses there are other JIT engines, some uncommon ones don't have many resources and won't do this.