Can we get sparse attention working with A100s / CUDA 11? #207

sdtblck · 2021-04-05T13:30:32Z

This one will probably be a tonne of work, and i have no idea where to start, but it seems deepspeed's sparse attention only works with cuda 10.x, and only on specific gpu architectures.

It would be great to have a sparse attention implementation that works with our setup, or to fix deepspeed's.

according to the blocksparse dev (pretty sure deepspeed's sparse attn is based on this) we can try using triton.ops.blocksparse (ptillet/torch-blocksparse#38) ?

The text was updated successfully, but these errors were encountered:

sdtblck · 2021-04-05T13:54:17Z

looks like deepspeed is updating their triton support, so maybe integrating this will fix it microsoft/DeepSpeed#902

sdtblck · 2021-04-09T13:01:30Z

Should work now after installing this Deeperspeed commit EleutherAI/DeeperSpeed@04a52ad

StellaAthena added the feature request New feature or request label Apr 5, 2021

sdtblck closed this as completed Apr 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can we get sparse attention working with A100s / CUDA 11? #207

Can we get sparse attention working with A100s / CUDA 11? #207

sdtblck commented Apr 5, 2021

sdtblck commented Apr 5, 2021

sdtblck commented Apr 9, 2021

Can we get sparse attention working with A100s / CUDA 11? #207

Can we get sparse attention working with A100s / CUDA 11? #207

Comments

sdtblck commented Apr 5, 2021

sdtblck commented Apr 5, 2021

sdtblck commented Apr 9, 2021