Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KV Cache Improved Flexibility #4668

Merged
merged 5 commits into from
Nov 14, 2023
Merged

KV Cache Improved Flexibility #4668

merged 5 commits into from
Nov 14, 2023

Conversation

cmikeh2
Copy link
Contributor

@cmikeh2 cmikeh2 commented Nov 11, 2023

This KV-cache adds the foundation for appropriately supporting two key KV-cache improvements:

  1. Delineation between local/dense KV caches/models at the cache level in addition to the attention module level.
  2. Support for multiple types of disjoint KV caches (such as alternating local + dense attention GPT-Neo).

Follow up item: Determine appropriate statistics for weighting local + dense KV block ratios when both are present.

@awan-10 awan-10 added this pull request to the merge queue Nov 14, 2023
Merged via the queue into master with commit 901d807 Nov 14, 2023
16 checks passed
@@ -124,7 +116,9 @@ def flush_sequence(self, uid: int) -> None:
return

seq = self._seqs[uid]
self._kv_cache.free(seq.all_block_ids)
for i in range(self.n_kv_caches):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.n_kv_caches doesn't exist :(

image

jeffra added a commit that referenced this pull request Nov 16, 2023
mauryaavinash95 pushed a commit to mauryaavinash95/DeepSpeed that referenced this pull request Feb 17, 2024
This KV-cache adds the foundation for appropriately supporting two key
KV-cache improvements:

1. Delineation between local/dense KV caches/models at the cache level
in addition to the attention module level.
2. Support for multiple types of disjoint KV caches (such as alternating
local + dense attention GPT-Neo).

Follow up item: Determine appropriate statistics for weighting local +
dense KV block ratios when both are present.

---------

Co-authored-by: Olatunji Ruwase <[email protected]>
mauryaavinash95 pushed a commit to mauryaavinash95/DeepSpeed that referenced this pull request Feb 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants