Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre-trained Model with Sparse/BlockSparse Attention for Long Sequences and Reduced GPU Memory Consumption #924

Closed
puyuanOT opened this issue May 5, 2023 · 1 comment
Labels
feature request New feature or request

Comments

@puyuanOT
Copy link

puyuanOT commented May 5, 2023

Is your feature request related to a problem? Please describe.
I am working with long sequences, and I often encounter limitations when handling these sequences effectively due to high GPU memory consumption. This issue is particularly challenging when using models with full attention mechanisms, which scale quadratically with sequence length.

Describe the solution you'd like
I would like to request the development of a pre-trained model that incorporates sparse or BlockSparse attention mechanisms. These mechanisms are designed to handle long sequences more efficiently, without consuming excessive GPU memory. The model should be able to process long input sequences while maintaining performance comparable to existing models that use full attention.

Describe alternatives you've considered

  1. Truncating or splitting long sequences: This approach can be used to fit sequences within memory constraints, but it may result in loss of contextual information and reduced model performance.
  2. Sliding window approach: This method involves processing long sequences in smaller, overlapping segments. However, it still does not fully address the problem of capturing long-range dependencies in the data.

Additional context
I am mainly wondering if such a model will be pushed to Hugging Face's model repository, which would make it more accessible and easier to use for the broader community.

@puyuanOT puyuanOT added the feature request New feature or request label May 5, 2023
@StellaAthena
Copy link
Member

We have generally found such things to perform poorly, and do not currently have plans to train such a model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants