Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/t5 #8

Merged
merged 25 commits into from
Apr 20, 2022
Merged

Feature/t5 #8

merged 25 commits into from
Apr 20, 2022

Conversation

dpressel
Copy link
Owner

Add T5 using abstract toolkit (factory) pattern. T5 has a custom LN impl, and no biases anywhere. It also doesnt
scale the multiheaded attention and it uses a relative bias instead of learned positional embeddings.

Its a pre-layer norm model and otherwise fairly vanilla.

To make this work, I needed to change the FFN impl, MHA impl and the LN impl. The implementation so far can
load and run a T5 checkpoint.

I also refactored the encoder-decoder so that the encoder is pre-computed. This is much more efficient than recomputing
the encoder embeddings at ever step of greedy decode. This affects the BART completer example which was refactored
accordingly

@dpressel dpressel changed the title WIP: Feature/t5 Feature/t5 Apr 20, 2022
@dpressel dpressel merged commit 2f453e4 into main Apr 20, 2022
@dpressel dpressel deleted the feature/t5 branch April 20, 2022 16:54
@dpressel dpressel restored the feature/t5 branch April 20, 2022 16:57
@dpressel dpressel deleted the feature/t5 branch April 22, 2022 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant