I think we agree about working set size - that's what actually matters for perfo...

I think we agree about working set size - that's what actually matters for performance rather than overall code size. Krste from SiFive was relatively insistent on their recent call - without any proof (citing mysterious customer calls) - that people care about code size of the Linux kernel, not working set size. The performance gain he suggested that came from C instructions due to working set size in the Linux kernel is 3%. This is the performance argument coming from the biggest proponent of the C instructions.

As to what you suggested, I have actually started putting something together to possibly send to the RISC-V foundation from my own experience implementing RISC-V designs, but pretty much nobody is asserting that loops are predominantly 32-bit instructions. Tight loops are often already sitting in a uop cache once you get to a core of reasonable size, so compressed vs uncompressed is completely irrelevant. Contrary to what you seem to be hoping for, correct arguments about working set size and performance are very subtle.

The C instructions aren't free in frequency terms, either. You have significant complexity increases in decoders and cache hierarchies to support them. Making that cost add up to 3% is not that hard.