> about the max Apple could make in terms of die size reticle limit without going chiplet
I probably have no idea what I'm talking about as I'm a software guy. But — I remember that company that made that crazy ML accelerator where an entire silicon wafer is one chip. How'd they do that? Why can't Apple and others do the same/similar?
Wafer Scale Engine doesn't break the reticle limit, if you search for a picture on Cerebras, you can still clearly see the square die within the wafer. From a very high level overview, they are basically doing cross interconnect with each die on wafer. Scribe line used to be physical separation for each die, now they built interconnect on top so the whole wafer becomes a huge mesh of AI chip.
It works for because you can think of each die as lots of Neutral Engine and cache only. You can easily rework around each process defects and solve the yield problem. And then you have to solve packaging, power and cooling.
>Why can't Apple and others do the same/similar?
When you have a complex SoC. Not every defect can be worked around, that is why the larger the die the lower the yield. A reason why you see in GPU they fused off certain GPU core and Memory Controller. It isn't because of market segmentation, it is necessity coming from manufacturing. To avoid this they have chiplet. Making each part a small die and interconnect them together. But as mentioned, chiplet isn't a silver bullet. Your design must have chiplet in mind for interconnect with AMD Zen. You cant just dump two pieces of M1 Max together on the same interposer and expect it to work. They could, but it wouldn't be efficient. And that sort of defeat the purpose. An analogy would be asking the current AMD Ryzen APU, which is an SoC with GPU and Memory controller, you stick two of them together and somehow expect it to work 100% faster.
I'd bet it's the plumbing. Getting data in and out, supplying something around 10-100 kiloamps, attaching it to anything without it ripping itself off via thermal expansion, getting the heat out - all of that sounds miserably difficult.
Like sure you can do some outrageous and expensive things to make it all work in small quantities and for huge sums of money. But you can't build a mass-market laptop that way.
I'd guess that the upper limit with present-day and near-future packaging technology for commodity hardware is ~1000 mm^2.
I probably have no idea what I'm talking about as I'm a software guy. But — I remember that company that made that crazy ML accelerator where an entire silicon wafer is one chip. How'd they do that? Why can't Apple and others do the same/similar?