Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can we get sparse attention working with A100s / CUDA 11? #207

Closed
sdtblck opened this issue Apr 5, 2021 · 2 comments
Closed

Can we get sparse attention working with A100s / CUDA 11? #207

sdtblck opened this issue Apr 5, 2021 · 2 comments
Labels
feature request New feature or request

Comments

@sdtblck
Copy link
Contributor

sdtblck commented Apr 5, 2021

This one will probably be a tonne of work, and i have no idea where to start, but it seems deepspeed's sparse attention only works with cuda 10.x, and only on specific gpu architectures.

It would be great to have a sparse attention implementation that works with our setup, or to fix deepspeed's.

according to the blocksparse dev (pretty sure deepspeed's sparse attn is based on this) we can try using triton.ops.blocksparse (ptillet/torch-blocksparse#38) ?

@StellaAthena StellaAthena added the feature request New feature or request label Apr 5, 2021
@sdtblck
Copy link
Contributor Author

sdtblck commented Apr 5, 2021

looks like deepspeed is updating their triton support, so maybe integrating this will fix it microsoft/DeepSpeed#902

@sdtblck
Copy link
Contributor Author

sdtblck commented Apr 9, 2021

Should work now after installing this Deeperspeed commit EleutherAI/DeeperSpeed@04a52ad

@sdtblck sdtblck closed this as completed Apr 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants