But this is the solution the most cutting edge llm research has yielded, how do ...

mcintyre1994 · 2025-05-07T09:32:39 1746610359

I'd guess the benefit is that it's quicker/easier to experiment with the prompt? Claude has prompt caching, I'm not sure how efficient that is but they offer a discount on requests that make use of it. So it might be that that's efficient enough that it's worth the tradeoff for them?

Also I don't think much of this prompt is used in the API, and a bunch of it is enabling specific UI features like Artifacts. So if they re-use the same model for the API (I'm guessing they do but I don't know) then I guess they're limited in terms of fine tuning.

int_19h · 2025-05-07T20:16:27 1746648987

Prompt caching is functionally identical to snapshotting the model after it processed the prompt. And you need the KV cache for inference in any case so it doesn't even cost extra memory to keep it around, if every single inference task is going to have the same prompt suffix.