product management

lucidrains · Mar 24, 2023 · b25a138 · b25a138
1 parent 6a1f26e
commit b25a138
Showing 1 changed file with 1 addition and 0 deletions.
diff --git a/README.md b/README.md
@@ -69,6 +69,7 @@ $ python train.py
 
 - [ ] revisit <a href="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/lucidrains/memformer">memformer</a>
 - [ ] add ability to gate in memorizing transformers knn attention layers
+- [ ] figure out best way to cache the causal mask across all attention layers, but also reach out to pytorch team about some of the limitations for causal in q seq len != kv seq len scenario
 
 ## Citations