It's true, but isn't OP also correct? Ie. it's about speed, which implies locality, which implies approaches like MoE which does exactly that and it's unlike physical brain topology?
Having said that it would be fun to see things like rearrangement data moves based on temerature of silicon parts after training cycle.
Having said that it would be fun to see things like rearrangement data moves based on temerature of silicon parts after training cycle.