mmarius / awesome-finetuning Public

Notifications You must be signed in to change notification settings
Fork 1
Star 19

A curated list of resources on fine-tuning language models.

19 stars 1 fork Branches Tags Activity

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md

Repository files navigation

Awesome Fine-tuning

A curated list of resources on fine-tuning language models, inspired by awesome-implicit-representations.

Disclaimer

This list does not aim to be exhaustive. Feel free to open a pull request in order to suggest papers that should be added to the list.

Disclosure. I'm an author of the following papers:

Table of contents

Papers

Papers

Fine-tuning before transformers

Semi-supervised Sequence Learning Dai & Le (2015)
How Transferable are Neural Networks in NLP Applications? Mou et al. (2016)
Improving Neural Machine Translation Models with Monolingual Data Sennrich et al. (2016)
Question Answering through Transfer Learning from Large Fine-grained Supervision Data Min et al. (2017)
Universal Language Model Fine-tuning for Text Classification Howard & Ruder (2018)
An Embarrassingly Simple Approach for Transfer Learning from Pretrained Language Models Chronopoulou et al. (2019)
...

Fine-tuning transformers

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Devlin et al. (2019)
Better Fine-Tuning by Reducing Representational Collapse Aghajanyan et al. (2020)
FreeLB: Enhanced Adversarial Training for Natural Language Understanding Zhu et al. (2020)
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization Jiang et al. (2020)
Supervised Contrastive Learning for Pre-trained Language Model Fine-tuning Gunel et al. (2021)
...

Intermediate task fine-tuning

Sentence Encoders on STILTs: Supplementary Training on Intermediate Labeled-data Tasks Phang et al. (2018)
Transfer Fine-Tuning: A BERT Case Study Arase & Tsujii (2019)
Learning and Evaluating General Linguistic Intelligence Yogatama et al. (2019)
Intermediate-Task Transfer Learning with Pretrained Language Models: When and Why Does It Work? Pruksachatkun et al. (2020)
English Intermediate-Task Training Improves Zero-Shot Cross-Lingual Transfer Too Phang et al. (2020)
What to Pre-Train on? Efficient Intermediate Task Selection Poth et al. (2021)
Is Supervised Syntactic Parsing Beneficial for Language Understanding Tasks? An Empirical Investigation Glavaš & Vulić (2021)
Muppet: Massive Multi-task Representations with Pre-Finetuning Aghajanyan et al. (2021)
...

Intermediate (masked) language modeling

Unsupervised Domain Adaptation of Contextualized Embeddings for Sequence Labeling Han & Eisenstein (2019)
Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks Gururangan et al. (2020)
Mining Knowledge for Natural Language Inference from Wikipedia Categories Chen et al. (2020)
Parsing with Multilingual BERT, a Small Corpus, and a Small Treebank Chau et al. (2020)
Train No Evil: Selective Masking for Task-Guided Pre-Training Gu et al. (2020)
...

Injecting "skills"

Injecting Numerical Reasoning Skills into Language Models Geva et al. (2020)
Common Sense or World Knowledge? Investigating Adapter-Based Knowledge Injection into Pretrained Transformers Lauscher et al. (2020)
Analyzing Commonsense Emergence in Few-shot Knowledge Models Da et al. (2021)
...

Parameter-efficient fine-tuning

Parameter-Efficient Transfer Learning for NLP Houlsby et al. (2019)
BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning Stickland & Murray (2019)
Simple, Scalable Adaptation for Neural Machine Translation Bapna & Firat (2019)
Masking as an Efficient Alternative to Finetuning for Pretrained Language Models Zhao et al. (2020)
Movement Pruning: Adaptive Sparsity by Fine-Tuning Sanh et al. (2020)
AdapterFusion: Non-Destructive Task Composition for Transfer Learning Pfeiffer et al. (2021)
MAD-X: An Adapter-Based Framework for Multi-Task Cross-Lingual Transfer Pfeiffer et al. (2020)
AdapterDrop: On the Efficiency of Adapters in Transformers Rücklé et al. (2021)
Parameter-efficient transfer learning with diff pruning Guo et al. (2021)
Compacter: Efficient Low-Rank Hypercomplex Adapter Layers Mahabadi et al. (2021)
LoRA: Low-Rank Adaptation of Large Language Models Hu et al. (2021)
BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models Zaken et al. (2022)
Training Neural Networks with Fixed Sparse Masks Sung et al. (2021)
Towards a Unified View of Parameter-Efficient Transfer Learning He et al. (2021)
Composable Sparse Fine-Tuning for Cross-Lingual Transfer Ansell et al. (2022)
Revisiting Parameter-Efficient Tuning: Are We Really There Yet? Chen et al. (2022)
Prompt-free and Efficient Few-shot Learning with Language Models Mahabadi et al. (2022)
Adaptable Adapters Moosavi et al. (2022)
Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning Liu et al. (2022)
...

Some continuous prompt-based methods can also be seen as parameter-efficient fine-tuning methods. For a list of papers see below.

Prompt-based fine-tuning

Discrete prompts

Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference Schick & Schütze (2021a)
It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners Schick & Schütze (2021b)
Automatically Identifying Words That Can Serve as Labels for Few-Shot Text Classification Schick et al. (2020)
Few-Shot Text Generation with Natural Language Instructions Schick & Schütze (2021c)
Making Pre-trained Language Models Better Few-shot Learners Gao et al. (2021)
AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts Shin et al. (2020)
How Many Data Points is a Prompt Worth? Le Scao & Rush (2021)
Improving and Simplifying Pattern Exploiting Training Tam et al. (2021)
Adapting Language Models for Zero-shot Learning by Meta-tuning on Dataset and Prompt Collections Zhong et al. (2021)
Calibrate Before Use: Improving Few-Shot Performance of Language Models Zhao et al. (2021)
PTR: Prompt Tuning with Rules for Text Classification Han et al. (2021)
Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models Logan IV et al. (2021)
Knowledgeable Prompt-tuning: Incorporating Knowledge into Prompt Verbalizer for Text Classification Hu et al. (2021)
Prompt-Learning for Fine-Grained Entity Typing Ding et al. (2021)
Do Prompt-Based Models Really Understand the Meaning of their Prompts? Webson & Pavlick (2022)
Avoiding Inference Heuristics in Few-shot Prompt-based Finetuning Utama et al. (2021)
Prototypical Verbalizer for Prompt-based Few-shot Tuning Cui et al. (2022)
...

Multi-task fine-tuning using discrete prompts

Cross-Task Generalization via Natural Language Crowdsourcing Instructions Mishra et al. (2021)
Discrete and Soft Prompting for Multilingual Models Zhao & Schütze (2021)
Finetuned Language Models Are Zero-Shot Learners Wei et al. (2021)
Multitask Prompted Training Enables Zero-Shot Task Generalization Sanh et al. (2021)
Prompt Consistency for Zero-Shot Task Generalization Zhou et al. (2022)
Few-shot Adaptation Works with UnpredicTable Data Chan et al. (2022)
Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks Wang et al. (2022)
...

Continuous prompts

Prefix-Tuning: Optimizing Continuous Prompts for Generation Li & Liang (2021)
WARP: Word-level Adversarial ReProgramming Hambardzumyan et al. (2021)
Learning How to Ask: Querying LMs with Mixtures of Soft Prompts Qin & Eisner (2021)
Factual Probing Is [MASK]: Learning vs. Learning to Recall Zhong et al. (2021)
The Power of Scale for Parameter-Efficient Prompt Tuning Lester et al. (2021)
Multimodal Few-Shot Learning with Frozen Language Models Tsimpoukelli et al. (2021)
Noisy Channel Language Model Prompting for Few-Shot Text Classification Min et al. (2021)
Continuous Entailment Patterns for Lexical Inference in Context Schmitt & Schütze (2021)
Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners Zhang et al. (2022)
SPoT: Better Frozen Model Adaptation through Soft Prompt Transfer Vu et al. (2022)
P-Tuning: Prompt Tuning Can Be Comparable to Fine-tuning Across Scales and Tasks Liu et al. (2022)
...

Evaluating few-shot fine-tuning

True Few-Shot Learning with Language Models Perez et al. (2021)
FLEX: Unifying Evaluation for Few-Shot NLP Bragg et al. (2021)
FewNLU: Benchmarking State-of-the-Art Methods for Few-Shot Natural Language Understanding Zheng et al. (2022)
True Few-Shot Learning with Prompts—A Real-World Perspective Schick & Schütze (2022)
...

Fine-tuning analysis

Visualizing and Understanding the Effectiveness of BERT Hao et al. (2019)
oLMpics-On What Language Model Pre-training Captures Talmor et al. (2020)
Pretrained Transformers Improve Out-of-Distribution Robustness Hendrycks et al. (2020)
What Happens To BERT Embeddings During Fine-tuning? Merchant et al. (2020)
Investigating Learning Dynamics of BERT Fine-Tuning Hao et al. (2020)
Investigating Transferability in Pretrained Language Models Tamkin et al. (2020)
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning Aghajanyan et al. (2021)
Fine-Tuned Transformers Show Clusters of Similar Representations Across Layers Phang et al. (2021)
Fine-Tuning can Distort Pretrained Features and Underperform Out-of-Distribution Kumar et al. (2022)
A Closer Look at How Fine-tuning Changes BERT Zhou & Srikumar (2022)
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning Aghajanyan et al. (2021)
When Do You Need Billions of Words of Pretraining Data? Zhang et al. (2021)
On the Effectiveness of Adapter-based Tuning for Pretrained Language Model Adaptation He et al. (2021)
Pretrained Transformers as Universal Computation Engines Lu et al. (2021)
Predicting Inductive Biases of Pre-Trained Models Lovering et al. (2021)
Fine-Tuned Transformers Show Clusters of Similar Representations Across Layers Phang et al. (2021)
...

Fine-tuning stability

Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping Dodge et al. (2020)
Revisiting Few-sample BERT Fine-tuning Zhang et al. (2021)
On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines Mosbach et al. (2021)
...

Fine-tuning and probing

What Happens To BERT Embeddings During Fine-tuning? Merchant et al. (2020)
On the Interplay Between Fine-tuning and Sentence-level Probing for Linguistic Knowledge in Pre-trained Transformers Mosbach et al. (2020)
On the Importance of Data Size in Probing Fine-tuned Models Mehrafarin et al. (2022)
...

Fine-tuning and generalization

BERTs of a feather do not generalize together: Large variability in generalization across models with similar test set performance McCoy et al. (2020)
Generalization in NLI: Ways (Not) To Go Beyond Simple Heuristics Bhagava et al. (2021)
Linear Connectivity Reveals Generalization Strategies Juneja et al. (2022)
...

Fine-tuning and spurious features

An Empirical Study on Robustness to Spurious Correlations using Pre-trained Language Models Tu et al. (2020)
Learning Which Features Matter: RoBERTa Acquires a Preference for Linguistic Generalizations (Eventually) Warstadt et al. (2020)
Predicting Inductive Biases of Pre-Trained Models Lovering et al. (2021)
...

Theoretical work

A Mathematical Exploration of Why Language Models Help Solve Downstream Tasks Saunshi et al. (2021)
Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning Wei et al. (2021)
...

Surveys

Recent Advances in Language Model Fine-tuning Ruder (2021)
On the Opportunities and Risks of Foundation Models (Adaptation chapter) Bommasani et al. (2021)
Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing Liu et al. (2021)
Delta Tuning: A Comprehensive Study of Parameter Efficient Methods for Pre-trained Language Models Ding et al. (2022)
...

Misc.

What is being transferred in transfer learning? Neyshabur et al. (2020)
Leap-Of-Thought: Teaching Pre-Trained Models to Systematically Reason Over Implicit Knowledge Talmor et al. (2020)
Exploring and Predicting Transferability across NLP Tasks Vu et al. (2020)
...