Intel PMem really shines for things you need to be non-volatile (preserved when ...

Intel PMem really shines for things you need to be non-volatile (preserved when the power goes out) like fast changing rows in a database. As far as I understand it, "for when you need millions of TPS on a DB that can't fit in RAM" was/is the "killer app" of PMem.

Which suggests it wouldn't be quite the right fit here -- the precomputed constants in the model aren't changing, nor do they need to persist.

Still, interesting question, and I wonder if there's some other existing bit of tech that can be repurposed for this.

I wonder if/when this application (LLMs in general) will slow down and stabilize long enough for anything but general purpose components to make sense. Like, we could totally shove model parameters in some sort of ROM and have hardware offload for a transformer, IF it wasn't the case that 10 years from now we might be on to some other paradigm.