Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to define the prompts for KV caching up-front? #401

Closed
timothylimyl opened this issue Apr 29, 2024 · 2 comments
Closed

Is it possible to define the prompts for KV caching up-front? #401

timothylimyl opened this issue Apr 29, 2024 · 2 comments
Labels

Comments

@timothylimyl
Copy link

For a lot of use cases, there is already a pre-defined system + base prompt that is used.

Can we define the KV cache for these prompts up front manually? For example, if we are extracting information out of a provided context, the provided context prompt changes but the system + base prompt stays the same. Caching the context will make no sense as it is guaranteed to change on the next inference.

@timothylimyl
Copy link
Author

Add-on question is how are you currently deciding on how to persist the cache?

I think this should be user defined. For example, typically I can guaranteed that my application calls the LLM in XX numbers of ways (whether is it agentic workflow, RAG, etc). For the XX numbers of ways, it will be best to KV cache all of these different system and base prompts.

If i can define this up-front, then I can also gauge the memory usage.

Copy link

This issue has been automatically closed due to inactivity. Please feel free to reopen it if needed.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant