> I mean, first registers are anyway saved on the stack when calling a function ...

> I mean, first registers are anyway saved on the stack when calling a function

No, they aren't. For registers defined in the calling convention as "callee-saved", they don't have to be saved on the stack before calling a function (and the called function only has to save them if it actually uses that register). And for registers defined as "caller-saved", they only have to be saved if their value needs to be kept. The compiler knows all that, and tends to use caller-saved registers as scratch space (which doesn't have to be preserved), and callee-saved registers for longer-lived values.

> and caches of modern processors are really nearly as fast (if not as fast!) as a register.

No, they aren't. For instance, a quick web search tells me that the L1D cache for a modern AMD CPU has at least 4 cycles of latency. Which means: even if the value you want to read is already in the L1 cache, the processor has to wait 4 cycles before it has that value.

> Registers these days are merely labels, since internally the processor (at least for x86) executes the code in a sort of VM.

No, they aren't. The register file still exists, even though register renaming means which physical register corresponds to a logical register can change. And there's no VM, most common instructions are decoded directly (without going through microcode) into a single µOp or pair of µOps which is executed directly.

> To me it seems that all these optimizations were really something useful back in the day, but nowadays we can as well just ignore them and let the processor figure it out without that much loss of performance.

It's the opposite: these optimizations are more important nowadays, since memory speeds have not kept up with processor speeds, and power consumption became more relevant.

> To me security is more important than a 1% more boost in performance.

Newer programming languages agree with you, and do things like checking array bounds on every access; they rely on compiler optimizations so that the loss of performance is only that "1%".