And, amusingly, the instruction count is also very competitive, especially inside loops.
Furthermore, it achieves all of that with a much simpler ISA that matches x86 and arm in features, while having an order of magnitude less instructions to implement and verify.
Compiler output is not a good way to show off the best of an ISA (which is more an indictment of how bad compilers actually are at optimising for code density). Look at the demoscene. x86 can be an order of magnitude denser than lame compiler output.
Any code size increases are made up for elsewhere and they STILL get smaller code too.