Multitask-learning of a BERT backbone. Allows to easily train a BERT model with state-of-the-art method such as PCGrad, Gradient Vaccine, PALs, Scheduling, Class imbalance handling and many optimizations
-
Updated
Oct 8, 2023 - Python
Multitask-learning of a BERT backbone. Allows to easily train a BERT model with state-of-the-art method such as PCGrad, Gradient Vaccine, PALs, Scheduling, Class imbalance handling and many optimizations
Add a description, image, and links to the projected-attention-layers topic page so that developers can more easily learn about it.
To associate your repository with the projected-attention-layers topic, visit your repo's landing page and select "manage topics."