A curated list of resources on fine-tuning language models, inspired by awesome-implicit-representations.
This list does not aim to be exhaustive. Feel free to open a pull request in order to suggest papers that should be added to the list.
Disclosure. I'm an author of the following papers:
- On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines
- On the Interplay Between Fine-tuning and Sentence-level Probing for Linguistic Knowledge in Pre-trained Transformers
-
Semi-supervised Sequence Learning Dai & Le (2015)
-
How Transferable are Neural Networks in NLP Applications? Mou et al. (2016)
-
Improving Neural Machine Translation Models with Monolingual Data Sennrich et al. (2016)
-
Question Answering through Transfer Learning from Large Fine-grained Supervision Data Min et al. (2017)
-
Universal Language Model Fine-tuning for Text Classification Howard & Ruder (2018)
-
An Embarrassingly Simple Approach for Transfer Learning from Pretrained Language Models Chronopoulou et al. (2019)
-
...
-
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Devlin et al. (2019)
-
Better Fine-Tuning by Reducing Representational Collapse Aghajanyan et al. (2020)
-
FreeLB: Enhanced Adversarial Training for Natural Language Understanding Zhu et al. (2020)
-
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization Jiang et al. (2020)
-
Supervised Contrastive Learning for Pre-trained Language Model Fine-tuning Gunel et al. (2021)
-
...
-
Sentence Encoders on STILTs: Supplementary Training on Intermediate Labeled-data Tasks Phang et al. (2018)
-
Transfer Fine-Tuning: A BERT Case Study Arase & Tsujii (2019)
-
Learning and Evaluating General Linguistic Intelligence Yogatama et al. (2019)
-
Intermediate-Task Transfer Learning with Pretrained Language Models: When and Why Does It Work? Pruksachatkun et al. (2020)
-
English Intermediate-Task Training Improves Zero-Shot Cross-Lingual Transfer Too Phang et al. (2020)
-
What to Pre-Train on? Efficient Intermediate Task Selection Poth et al. (2021)
-
Is Supervised Syntactic Parsing Beneficial for Language Understanding Tasks? An Empirical Investigation Glavaš & Vulić (2021)
-
Muppet: Massive Multi-task Representations with Pre-Finetuning Aghajanyan et al. (2021)
-
...
-
Unsupervised Domain Adaptation of Contextualized Embeddings for Sequence Labeling Han & Eisenstein (2019)
-
Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks Gururangan et al. (2020)
-
Mining Knowledge for Natural Language Inference from Wikipedia Categories Chen et al. (2020)
-
Parsing with Multilingual BERT, a Small Corpus, and a Small Treebank Chau et al. (2020)
-
Train No Evil: Selective Masking for Task-Guided Pre-Training Gu et al. (2020)
-
...
-
Injecting Numerical Reasoning Skills into Language Models Geva et al. (2020)
-
Common Sense or World Knowledge? Investigating Adapter-Based Knowledge Injection into Pretrained Transformers Lauscher et al. (2020)
-
Analyzing Commonsense Emergence in Few-shot Knowledge Models Da et al. (2021)
-
...
-
Parameter-Efficient Transfer Learning for NLP Houlsby et al. (2019)
-
BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning Stickland & Murray (2019)
-
Simple, Scalable Adaptation for Neural Machine Translation Bapna & Firat (2019)
-
Masking as an Efficient Alternative to Finetuning for Pretrained Language Models Zhao et al. (2020)
-
Movement Pruning: Adaptive Sparsity by Fine-Tuning Sanh et al. (2020)
-
AdapterFusion: Non-Destructive Task Composition for Transfer Learning Pfeiffer et al. (2021)
-
MAD-X: An Adapter-Based Framework for Multi-Task Cross-Lingual Transfer Pfeiffer et al. (2020)
-
AdapterDrop: On the Efficiency of Adapters in Transformers Rücklé et al. (2021)
-
Parameter-efficient transfer learning with diff pruning Guo et al. (2021)
-
Compacter: Efficient Low-Rank Hypercomplex Adapter Layers Mahabadi et al. (2021)
-
LoRA: Low-Rank Adaptation of Large Language Models Hu et al. (2021)
-
BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models Zaken et al. (2022)
-
Training Neural Networks with Fixed Sparse Masks Sung et al. (2021)
-
Towards a Unified View of Parameter-Efficient Transfer Learning He et al. (2021)
-
Composable Sparse Fine-Tuning for Cross-Lingual Transfer Ansell et al. (2022)
-
Revisiting Parameter-Efficient Tuning: Are We Really There Yet? Chen et al. (2022)
-
Prompt-free and Efficient Few-shot Learning with Language Models Mahabadi et al. (2022)
-
Adaptable Adapters Moosavi et al. (2022)
-
Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning Liu et al. (2022)
-
...
Some continuous prompt-based methods can also be seen as parameter-efficient fine-tuning methods. For a list of papers see below.
-
Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference Schick & Schütze (2021a)
-
It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners Schick & Schütze (2021b)
-
Automatically Identifying Words That Can Serve as Labels for Few-Shot Text Classification Schick et al. (2020)
-
Few-Shot Text Generation with Natural Language Instructions Schick & Schütze (2021c)
-
Making Pre-trained Language Models Better Few-shot Learners Gao et al. (2021)
-
AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts Shin et al. (2020)
-
How Many Data Points is a Prompt Worth? Le Scao & Rush (2021)
-
Improving and Simplifying Pattern Exploiting Training Tam et al. (2021)
-
Adapting Language Models for Zero-shot Learning by Meta-tuning on Dataset and Prompt Collections Zhong et al. (2021)
-
Calibrate Before Use: Improving Few-Shot Performance of Language Models Zhao et al. (2021)
-
PTR: Prompt Tuning with Rules for Text Classification Han et al. (2021)
-
Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models Logan IV et al. (2021)
-
Knowledgeable Prompt-tuning: Incorporating Knowledge into Prompt Verbalizer for Text Classification Hu et al. (2021)
-
Prompt-Learning for Fine-Grained Entity Typing Ding et al. (2021)
-
Do Prompt-Based Models Really Understand the Meaning of their Prompts? Webson & Pavlick (2022)
-
Avoiding Inference Heuristics in Few-shot Prompt-based Finetuning Utama et al. (2021)
-
Prototypical Verbalizer for Prompt-based Few-shot Tuning Cui et al. (2022)
-
...
-
Cross-Task Generalization via Natural Language Crowdsourcing Instructions Mishra et al. (2021)
-
Discrete and Soft Prompting for Multilingual Models Zhao & Schütze (2021)
-
Finetuned Language Models Are Zero-Shot Learners Wei et al. (2021)
-
Multitask Prompted Training Enables Zero-Shot Task Generalization Sanh et al. (2021)
-
Prompt Consistency for Zero-Shot Task Generalization Zhou et al. (2022)
-
Few-shot Adaptation Works with UnpredicTable Data Chan et al. (2022)
-
Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks Wang et al. (2022)
-
...
-
Prefix-Tuning: Optimizing Continuous Prompts for Generation Li & Liang (2021)
-
WARP: Word-level Adversarial ReProgramming Hambardzumyan et al. (2021)
-
Learning How to Ask: Querying LMs with Mixtures of Soft Prompts Qin & Eisner (2021)
-
Factual Probing Is [MASK]: Learning vs. Learning to Recall Zhong et al. (2021)
-
The Power of Scale for Parameter-Efficient Prompt Tuning Lester et al. (2021)
-
Multimodal Few-Shot Learning with Frozen Language Models Tsimpoukelli et al. (2021)
-
Noisy Channel Language Model Prompting for Few-Shot Text Classification Min et al. (2021)
-
Continuous Entailment Patterns for Lexical Inference in Context Schmitt & Schütze (2021)
-
Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners Zhang et al. (2022)
-
SPoT: Better Frozen Model Adaptation through Soft Prompt Transfer Vu et al. (2022)
-
P-Tuning: Prompt Tuning Can Be Comparable to Fine-tuning Across Scales and Tasks Liu et al. (2022)
-
...
-
True Few-Shot Learning with Language Models Perez et al. (2021)
-
FLEX: Unifying Evaluation for Few-Shot NLP Bragg et al. (2021)
-
FewNLU: Benchmarking State-of-the-Art Methods for Few-Shot Natural Language Understanding Zheng et al. (2022)
-
True Few-Shot Learning with Prompts—A Real-World Perspective Schick & Schütze (2022)
-
...
-
Visualizing and Understanding the Effectiveness of BERT Hao et al. (2019)
-
oLMpics-On What Language Model Pre-training Captures Talmor et al. (2020)
-
Pretrained Transformers Improve Out-of-Distribution Robustness Hendrycks et al. (2020)
-
What Happens To BERT Embeddings During Fine-tuning? Merchant et al. (2020)
-
Investigating Learning Dynamics of BERT Fine-Tuning Hao et al. (2020)
-
Investigating Transferability in Pretrained Language Models Tamkin et al. (2020)
-
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning Aghajanyan et al. (2021)
-
Fine-Tuned Transformers Show Clusters of Similar Representations Across Layers Phang et al. (2021)
-
Fine-Tuning can Distort Pretrained Features and Underperform Out-of-Distribution Kumar et al. (2022)
-
A Closer Look at How Fine-tuning Changes BERT Zhou & Srikumar (2022)
-
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning Aghajanyan et al. (2021)
-
When Do You Need Billions of Words of Pretraining Data? Zhang et al. (2021)
-
On the Effectiveness of Adapter-based Tuning for Pretrained Language Model Adaptation He et al. (2021)
-
Pretrained Transformers as Universal Computation Engines Lu et al. (2021)
-
Predicting Inductive Biases of Pre-Trained Models Lovering et al. (2021)
-
Fine-Tuned Transformers Show Clusters of Similar Representations Across Layers Phang et al. (2021)
-
...
-
Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping Dodge et al. (2020)
-
Revisiting Few-sample BERT Fine-tuning Zhang et al. (2021)
-
On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines Mosbach et al. (2021)
-
...
-
What Happens To BERT Embeddings During Fine-tuning? Merchant et al. (2020)
-
On the Interplay Between Fine-tuning and Sentence-level Probing for Linguistic Knowledge in Pre-trained Transformers Mosbach et al. (2020)
-
On the Importance of Data Size in Probing Fine-tuned Models Mehrafarin et al. (2022)
-
...
-
BERTs of a feather do not generalize together: Large variability in generalization across models with similar test set performance McCoy et al. (2020)
-
Generalization in NLI: Ways (Not) To Go Beyond Simple Heuristics Bhagava et al. (2021)
-
Linear Connectivity Reveals Generalization Strategies Juneja et al. (2022)
-
...
-
An Empirical Study on Robustness to Spurious Correlations using Pre-trained Language Models Tu et al. (2020)
-
Learning Which Features Matter: RoBERTa Acquires a Preference for Linguistic Generalizations (Eventually) Warstadt et al. (2020)
-
Predicting Inductive Biases of Pre-Trained Models Lovering et al. (2021)
-
...
-
A Mathematical Exploration of Why Language Models Help Solve Downstream Tasks Saunshi et al. (2021)
-
Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning Wei et al. (2021)
-
...
-
Recent Advances in Language Model Fine-tuning Ruder (2021)
-
On the Opportunities and Risks of Foundation Models (Adaptation chapter) Bommasani et al. (2021)
-
Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing Liu et al. (2021)
-
Delta Tuning: A Comprehensive Study of Parameter Efficient Methods for Pre-trained Language Models Ding et al. (2022)
-
...
-
What is being transferred in transfer learning? Neyshabur et al. (2020)
-
Leap-Of-Thought: Teaching Pre-Trained Models to Systematically Reason Over Implicit Knowledge Talmor et al. (2020)
-
Exploring and Predicting Transferability across NLP Tasks Vu et al. (2020)
-
...