Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(T5) Relative positional encodings? #66

Closed
CRG2K opened this issue Jan 16, 2021 · 6 comments · Fixed by #141
Closed

(T5) Relative positional encodings? #66

CRG2K opened this issue Jan 16, 2021 · 6 comments · Fixed by #141
Assignees
Labels
feature request New feature or request
Projects

Comments

@CRG2K
Copy link

CRG2K commented Jan 16, 2021

[This is the reminder of a conversation I had with @sdtblck]

Simpler than the TrXL relative encodings, the T5 relative bias should enable less expensive inference (~1000x?) due to caching and an improved num_layers * ctx_length effective context size.
image

@glebshevchukk
Copy link

@CRG2K What part of the code would this involve changing?

@CRG2K
Copy link
Author

CRG2K commented Jan 17, 2021

@glebshevchukk For starters, the positional bias is injected in every attention layer instead of just at the beginning. @lucidrains has a working implementation in x-transformers, but I'm not sure about the complications of just porting it over.
image

@glebshevchukk
Copy link

Interesting, that'd probably require changing https://github.com/microsoft/DeepSpeed/tree/master/deepspeed/ops/sparse_attention which is what is being used now for the core attention layers.

@StellaAthena StellaAthena added the feature request New feature or request label Jan 23, 2021
@StellaAthena
Copy link
Member

This seems reasonable to me, as long as it is an option that the user can turn on or off. It looks pretty straight forward though, so feel free to implement it and open a PR.

@StellaAthena StellaAthena added this to To do in 1T or BUST via automation Jan 23, 2021
@cfoster0
Copy link

In the event that modifying sparse attention proves too tricky, Position Infused Attention is a modification with similar benefits that only requires adding a positional bias to the keys & queries, prior to attention. Similarly implemented in x-transformers.

@StellaAthena StellaAthena moved this from To do to In progress in 1T or BUST Feb 22, 2021
@StellaAthena
Copy link
Member

@MicPie has implemented what appears to be a working relative positional encoding in the t5rpe branch. The main thing that currently requires validation is the fact that the user can easily choose which encoding to use in the config files.

@StellaAthena StellaAthena linked a pull request Feb 28, 2021 that will close this issue
@sdtblck sdtblck closed this as completed Mar 4, 2021
1T or BUST automation moved this from In progress to Done Mar 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
Development

Successfully merging a pull request may close this issue.

6 participants