Worse than that I wonder if the trouble at Intel (e.g. inability to develop post 14nm chips plus one insane instruction set extension after another -- I wonder if the point of AMX is to have a big die area that is mostly unused that doesn't need to be cooled) isn't something that people like him are running from but rather something they are going to bring with then wherever they wind up.
>one insane instruction set extension after another
You're probably going to see a whole lot more of this sort of thing given the limits to process scaling. Keeping things simple and backwardly compatible made sense when you could just throw more transistors at the problem. Now you're seeing more and more specialized circuitry that software people are just going to have to deal with.
I am not against a new instruction. At first blush the new JavaScript instruction in arm might seem like a boondoggle but it is a simple arithmetic operation.
Compare that to the non-scalable SIMD instructions that mean you have to rewrite your code to take advantage of them and resultingly people don't bother to use them at all.
AMX allocates a huge die area to GEMM functionality that gets used a lot less in real numerics than you'd gather from reading a linear algebra textbook.
There are other approaches to the problems the industry faces other than 'fill up the die with registers that will never be used', nvidia and apple are going that way and that is why they are succeeding and Intel is failing.
As I understand it, Apple have a direct equivalent to Intel's AMX as an undocumented instruction set on their new Apple Silicon laptop processors, it just took a while for people to figure it out because the whole thing was hidden behind an acceleration library that is implemented very differently on Intel-based Macs.