(T5) Relative positional encodings? #66

CRG2K · 2021-01-16T09:08:28Z

[This is the reminder of a conversation I had with @sdtblck]

Simpler than the TrXL relative encodings, the T5 relative bias should enable less expensive inference (~1000x?) due to caching and an improved num_layers * ctx_length effective context size.

The text was updated successfully, but these errors were encountered:

glebshevchukk · 2021-01-17T21:40:11Z

@CRG2K What part of the code would this involve changing?

CRG2K · 2021-01-17T22:46:13Z

@glebshevchukk For starters, the positional bias is injected in every attention layer instead of just at the beginning. @lucidrains has a working implementation in x-transformers, but I'm not sure about the complications of just porting it over.

glebshevchukk · 2021-01-18T00:06:26Z

Interesting, that'd probably require changing https://github.com/microsoft/DeepSpeed/tree/master/deepspeed/ops/sparse_attention which is what is being used now for the core attention layers.

StellaAthena · 2021-01-23T07:43:06Z

This seems reasonable to me, as long as it is an option that the user can turn on or off. It looks pretty straight forward though, so feel free to implement it and open a PR.

cfoster0 · 2021-01-29T15:23:14Z

In the event that modifying sparse attention proves too tricky, Position Infused Attention is a modification with similar benefits that only requires adding a positional bias to the keys & queries, prior to attention. Similarly implemented in x-transformers.

StellaAthena · 2021-02-22T07:48:25Z

@MicPie has implemented what appears to be a working relative positional encoding in the t5rpe branch. The main thing that currently requires validation is the fact that the user can easily choose which encoding to use in the config files.

StellaAthena added the feature request New feature or request label Jan 23, 2021

StellaAthena added this to To do in 1T or BUST via automation Jan 23, 2021

StellaAthena moved this from To do to In progress in 1T or BUST Feb 22, 2021

StellaAthena assigned MicPie Feb 22, 2021

StellaAthena linked a pull request Feb 28, 2021 that will close this issue

T5rpe #141

Merged

sdtblck closed this as completed Mar 4, 2021

1T or BUST automation moved this from In progress to Done Mar 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(T5) Relative positional encodings? #66

(T5) Relative positional encodings? #66

CRG2K commented Jan 16, 2021 •

edited

glebshevchukk commented Jan 17, 2021

CRG2K commented Jan 17, 2021

glebshevchukk commented Jan 18, 2021

StellaAthena commented Jan 23, 2021

cfoster0 commented Jan 29, 2021

StellaAthena commented Feb 22, 2021

(T5) Relative positional encodings? #66

(T5) Relative positional encodings? #66

Comments

CRG2K commented Jan 16, 2021 • edited

glebshevchukk commented Jan 17, 2021

CRG2K commented Jan 17, 2021

glebshevchukk commented Jan 18, 2021

StellaAthena commented Jan 23, 2021

cfoster0 commented Jan 29, 2021

StellaAthena commented Feb 22, 2021

CRG2K commented Jan 16, 2021 •

edited