I hear people say that a lot, but is that really how people at the leading edge of research do this? Those I know who are coming up with new stuff and not just new applications for old architectures, are either building loosely on animal models, or designing based off a traditional algorithm with some room for the training to take advantage of complex interactions the traditional algorithms don't.
There have been huge advances in the mathematics of neural networks from Greg Yang (formerly of Microsoft). This allowed predictable training-hyperparameter transfer from smaller versions of GPT-4 where they could be tuned, to the final large model.
He has proofs and theorems about frontiers of maximal feature learning before things devolve into equivalent to kernel methods, and more: a whole bunch of breakthrough math making deep links with random matrix theory.