GitHub - s3nh/transformer-tricks: A collection of tricks to speed up transformer inference

This repo contains code and latex files for the Transformer Tricks papers.

Flash normalization:
- arXiv paper: https://arxiv.org/abs/2407.09577
- Notebook:
- HuggingFace repo
Approximate attention [work in progress]:
- PDF here
Removing weights for skipless transformers:
- arXiv paper: https://arxiv.org/abs/2404.12362
- Notebook:
Precomputing the first layer:
- arXiv paper: https://arxiv.org/abs/2402.13388

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
latex		latex
README.md		README.md
approximate.pdf		approximate.pdf
flash.pdf		flash.pdf
flash_normalization.ipynb		flash_normalization.ipynb
precompute.pdf		precompute.pdf
remove.pdf		remove.pdf
removing_weights.ipynb		removing_weights.ipynb

Provide feedback