Pre-trained Model with Sparse/BlockSparse Attention for Long Sequences and Reduced GPU Memory Consumption #924

puyuanOT · 2023-05-05T03:42:36Z

Is your feature request related to a problem? Please describe.
I am working with long sequences, and I often encounter limitations when handling these sequences effectively due to high GPU memory consumption. This issue is particularly challenging when using models with full attention mechanisms, which scale quadratically with sequence length.

Describe the solution you'd like
I would like to request the development of a pre-trained model that incorporates sparse or BlockSparse attention mechanisms. These mechanisms are designed to handle long sequences more efficiently, without consuming excessive GPU memory. The model should be able to process long input sequences while maintaining performance comparable to existing models that use full attention.

Describe alternatives you've considered

Truncating or splitting long sequences: This approach can be used to fit sequences within memory constraints, but it may result in loss of contextual information and reduced model performance.
Sliding window approach: This method involves processing long sequences in smaller, overlapping segments. However, it still does not fully address the problem of capturing long-range dependencies in the data.

Additional context
I am mainly wondering if such a model will be pushed to Hugging Face's model repository, which would make it more accessible and easier to use for the broader community.

StellaAthena · 2023-05-05T04:37:41Z

We have generally found such things to perform poorly, and do not currently have plans to train such a model.

puyuanOT added the feature request New feature or request label May 5, 2023

StellaAthena closed this as completed May 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pre-trained Model with Sparse/BlockSparse Attention for Long Sequences and Reduced GPU Memory Consumption #924

Pre-trained Model with Sparse/BlockSparse Attention for Long Sequences and Reduced GPU Memory Consumption #924

puyuanOT commented May 5, 2023

StellaAthena commented May 5, 2023

Pre-trained Model with Sparse/BlockSparse Attention for Long Sequences and Reduced GPU Memory Consumption #924

Pre-trained Model with Sparse/BlockSparse Attention for Long Sequences and Reduced GPU Memory Consumption #924

Comments

puyuanOT commented May 5, 2023

StellaAthena commented May 5, 2023