The sweet spot for scalar code is about 24 registers, but that leads to weird offset-bits (there's an ISA that does this, but I forget what it's called), so 32 registers is easier to implement and provides a mild improvement in the long tail of atypical functions.
On the flip side, the ability to have more registers is very good for SIMD/GPU applications.
Absolutely, I'm not saying a 64bit instruction length with 5/6/7/8 bits of registers would be bad per se. In fact I'd be interested to see where it leads.
But if you have a processor that also uses 16 bit instructions those extra registers become unusable. Thumb can't encode all registers in all instructions so you have the high registers that are significantly less useful than the low registers.
X86 is the same, never really done 64bit ASM so I don't know if they improved that.
So then you may aswell just divide up the registers so you've got 16 general purpose registers and 16 registers for simd or whatever.
On the flip side, the ability to have more registers is very good for SIMD/GPU applications.