CPUs are still a great example for how caching simplifies things.
There's a long history in computer architecture of cores and accelerators that don't have a cache but instead rely on explicitly programmed local scratchpads. They are universally more difficult to program than general purpose CPUs because of that.
I’m sure the CPU designers would love it if they didn’t need several different layers of cache. Or no cache at all. Imagine if memory IOPS were as fast as L1 cache, no need for all that dedicated SRAM on the chip or worry about side channel attacks.
Sure, but we were talking about the perspective of software developers. The hardware designers take on complexity so that the software developer's work can be simpler.
There's a long history in computer architecture of cores and accelerators that don't have a cache but instead rely on explicitly programmed local scratchpads. They are universally more difficult to program than general purpose CPUs because of that.