Isn't the state of the model exactly the previous generated text (ie. the prompt...

int_19h · 2025-05-07T20:07:08 1746648428

When the prompt is processed, there is an internal key-value cache that gets updated with each token processed, and is ultimately used for inference of the new token. If you process the prompt first and then dump that internal cache, you can effectively resume prompt processing (and thus inference) from that point more or less for free.

https://medium.com/@plienhar/llm-inference-series-3-kv-cachi...