I wonder if you could put more logic units per core and load balance to prevent thermal throttling, or if you’d make the communication pathways slower at a rate that exceeds the gains.
That’s basically the tradeoff Apple made with their M series chips vs AMD/Intel which until recently have been chasing fast and narrow designs. Apple in contrast, has a crazy “wide” core aka it can issue and retire many more instructions per clock than basically any other mainstream CPU.