-
Notifications
You must be signed in to change notification settings - Fork 362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is it possible to define the prompts for KV caching up-front? #401
Comments
Add-on question is how are you currently deciding on how to persist the cache? I think this should be user defined. For example, typically I can guaranteed that my application calls the LLM in XX numbers of ways (whether is it agentic workflow, RAG, etc). For the XX numbers of ways, it will be best to KV cache all of these different system and base prompts. If i can define this up-front, then I can also gauge the memory usage. |
This issue has been automatically closed due to inactivity. Please feel free to reopen it if needed. |
For a lot of use cases, there is already a pre-defined system + base prompt that is used.
Can we define the KV cache for these prompts up front manually? For example, if we are extracting information out of a provided context, the provided context prompt changes but the system + base prompt stays the same. Caching the context will make no sense as it is guaranteed to change on the next inference.
The text was updated successfully, but these errors were encountered: