Feature/t5 #8

dpressel · 2022-04-18T18:58:21Z

Add T5 using abstract toolkit (factory) pattern. T5 has a custom LN impl, and no biases anywhere. It also doesnt
scale the multiheaded attention and it uses a relative bias instead of learned positional embeddings.

Its a pre-layer norm model and otherwise fairly vanilla.

To make this work, I needed to change the FFN impl, MHA impl and the LN impl. The implementation so far can
load and run a T5 checkpoint.

I also refactored the encoder-decoder so that the encoder is pre-computed. This is much more efficient than recomputing
the encoder embeddings at ever step of greedy decode. This affects the BART completer example which was refactored
accordingly

still need to change weight defaults

dpressel added 25 commits April 18, 2022 14:52

initial commit of PreLN models

968b695

refactor out files to keep it clean

f8cac95

pushing things around

8a79749

support PnP for all components

e1eaf6a

fixed some issues, mainly that T5 attn not scaled

286cb8f

update comment in BART generation

05b9ca3

add factory creator

2bc4a13

add T5 completer example

b41f80d

add link to T5

ba682ad

clean up comments

a5d2af3

fix FFN

7756ff5

oops name of dict

b2ce6a4

fix comment

bc789b4

add T5 colab link

1c05927

remove unneeded function

780be55

add noising collator

68d068d

initial attempt at pretraining

0b5b433

still need to change weight defaults

add pad masking, fix bart pad index

bc9ca9d

oops, t5 actually uses the </s> token

9205454

pad everything

faa59bd

add raw decode and eos early termination

c7d21d9

fix a few bugs

177b68a

add T5 scaling in pre projection

bfe38fc

get the weight initialization in there

68efa5b

simplify further

0702356

dpressel changed the title ~~WIP: Feature/t5~~ Feature/t5 Apr 20, 2022

dpressel merged commit 2f453e4 into main Apr 20, 2022

dpressel deleted the feature/t5 branch April 20, 2022 16:54

dpressel restored the feature/t5 branch April 20, 2022 16:57

dpressel deleted the feature/t5 branch April 22, 2022 18:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/t5 #8

Feature/t5 #8

dpressel commented Apr 18, 2022

Feature/t5 #8

Feature/t5 #8

Conversation

dpressel commented Apr 18, 2022