Resolve issues between kv_cache and flash attention. #1178

chaochen99 · 2024-03-08T09:50:48Z

When using kv cache and flash attention in conjunction, it's crucial to set the causal parameter of flash_varlen_qkv_fn to False. Failing to do so will lead to inaccurate results.

…to set the causal parameter of flash_varlen_qkv_fn to False. Failing to do so will lead to inaccurate results.

CLAassistant · 2024-03-08T09:51:05Z

All committers have signed the CLA.

Quentin-Anthony

LGTM. @haileyschoelkopf -- Please sanity-check this as well.

haileyschoelkopf · 2024-03-08T19:48:48Z

LGTM to me as well--reference: https://github.com/Dao-AILab/flash-attention/blob/9818f85fee29ac6b60c9214bce841f8109a18b1b/flash_attn/modules/mha.py#L504

When using kv cache and flash attention in conjunction, it's crucial …

883d04d

…to set the causal parameter of flash_varlen_qkv_fn to False. Failing to do so will lead to inaccurate results.

chaochen99 requested a review from Quentin-Anthony as a code owner March 8, 2024 09:50

Quentin-Anthony requested a review from haileyschoelkopf March 8, 2024 19:35

Quentin-Anthony approved these changes Mar 8, 2024

View reviewed changes

haileyschoelkopf approved these changes Mar 8, 2024

View reviewed changes

Quentin-Anthony merged commit c1fa994 into EleutherAI:main Mar 8, 2024
2 of 5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resolve issues between kv_cache and flash attention. #1178

Resolve issues between kv_cache and flash attention. #1178

chaochen99 commented Mar 8, 2024

CLAassistant commented Mar 8, 2024 •

edited

Quentin-Anthony left a comment

haileyschoelkopf commented Mar 8, 2024

Resolve issues between kv_cache and flash attention. #1178

Resolve issues between kv_cache and flash attention. #1178

Conversation

chaochen99 commented Mar 8, 2024

CLAassistant commented Mar 8, 2024 • edited

Quentin-Anthony left a comment

Choose a reason for hiding this comment

haileyschoelkopf commented Mar 8, 2024

CLAassistant commented Mar 8, 2024 •

edited