Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I was trying to do this using Prompt Caching like a month ago, but then noticed there's five minute maximum lifetime for the cached prompts - doesn't really work for my RAG needs (or probably most), where the queries would be ran during the next month or a year. I can't see any changes to that policy. Little surprised to see them talk about Prompt Caching relating to RAG.


They aren’t using the prompt caching on the query side, only on the embedding side… so you cache the document in the context window when ingesting it, but not during retrieval.


It seems a little odd to make multiple requests instead of using one request to create all the context for all the chunks.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: