-
Notifications
You must be signed in to change notification settings - Fork 971
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(T5) Relative positional encodings? #66
Comments
@CRG2K What part of the code would this involve changing? |
@glebshevchukk For starters, the positional bias is injected in every attention layer instead of just at the beginning. @lucidrains has a working implementation in x-transformers, but I'm not sure about the complications of just porting it over. |
Interesting, that'd probably require changing https://github.com/microsoft/DeepSpeed/tree/master/deepspeed/ops/sparse_attention which is what is being used now for the core attention layers. |
This seems reasonable to me, as long as it is an option that the user can turn on or off. It looks pretty straight forward though, so feel free to implement it and open a PR. |
In the event that modifying sparse attention proves too tricky, Position Infused Attention is a modification with similar benefits that only requires adding a positional bias to the keys & queries, prior to attention. Similarly implemented in x-transformers. |
@MicPie has implemented what appears to be a working relative positional encoding in the t5rpe branch. The main thing that currently requires validation is the fact that the user can easily choose which encoding to use in the config files. |
[This is the reminder of a conversation I had with @sdtblck]
Simpler than the TrXL relative encodings, the T5 relative bias should enable less expensive inference (~1000x?) due to caching and an improved
![image](https://user-images.githubusercontent.com/71207483/104807496-bcb16d00-57df-11eb-830e-406d533fd977.png)
num_layers * ctx_length
effective context size.The text was updated successfully, but these errors were encountered: