Is it possible to define the prompts for KV caching up-front? #401

timothylimyl · 2024-04-29T08:40:04Z

For a lot of use cases, there is already a pre-defined system + base prompt that is used.

Can we define the KV cache for these prompts up front manually? For example, if we are extracting information out of a provided context, the provided context prompt changes but the system + base prompt stays the same. Caching the context will make no sense as it is guaranteed to change on the next inference.

timothylimyl · 2024-04-29T08:46:20Z

Add-on question is how are you currently deciding on how to persist the cache?

I think this should be user defined. For example, typically I can guaranteed that my application calls the LLM in XX numbers of ways (whether is it agentic workflow, RAG, etc). For the XX numbers of ways, it will be best to KV cache all of these different system and base prompts.

If i can define this up-front, then I can also gauge the memory usage.

github-actions · 2024-07-25T06:33:23Z

This issue has been automatically closed due to inactivity. Please feel free to reopen it if needed.

github-actions bot closed this as completed Jul 25, 2024

github-actions bot added the inactive label Jul 25, 2024

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possible to define the prompts for KV caching up-front? #401

Is it possible to define the prompts for KV caching up-front? #401

timothylimyl commented Apr 29, 2024

timothylimyl commented Apr 29, 2024

github-actions bot commented Jul 25, 2024

Is it possible to define the prompts for KV caching up-front? #401

Is it possible to define the prompts for KV caching up-front? #401

Comments

timothylimyl commented Apr 29, 2024

timothylimyl commented Apr 29, 2024

github-actions bot commented Jul 25, 2024