Large Language Models for Data Annotation: A Survey

This is a curated list of papers about LLM for Annotation

maintained by Zhen Tan ([email protected]) and Alimohammad Beigi ([email protected]).
If you want to add new entries, please make PRs with the same format.
This list serves as a complement to the survey below.

[Large Language Models for Data Annotation: A Survey]

If you find this repo helpful, we would appreciate it if you could cite our survey.

@misc{tan2024large,
      title={Large Language Models for Data Annotation: A Survey}, 
      author={Zhen Tan and Alimohammad Beigi and Song Wang and Ruocheng Guo and Amrita Bhattacharjee and Bohan Jiang and Mansooreh Karami and Jundong Li and Lu Cheng and Huan Liu},
      year={2024},
      eprint={2402.13446},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

LLM-Based Data Annotation

Manually Engineered Prompt

[EACL 2024] GPTs Are Multilingual Annotators for Sequence Generation Tasks. [pdf] [code]
[arXiv 2023] AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators. [pdf]
[arXiv 2023] RAFT: Reward Ranked FineTuning for Generative Foundation Model Alignment. [pdf]
[arXiv 2023] Small Models are Valuable Plug-ins for Large Language Models. [pdf] [code]
[arXiv 2022] Language Models in the Loop: Incorporating Prompting into Weak Supervision. [pdf]
[EMNLP 2022] ZeroGen: Efficient Zero-shot Learning via Dataset Generation. [pdf] [code]
[NAACL-HLT 2022] Learning To Retrieve Prompts for In-Context Learning. [pdf] [code]
[EMNLP 2021] Constrained Language Models Yield Few-Shot Semantic Parsers. [pdf] [code]

Alignment via Pairwise Feedback

[ACL 2023] Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers. [pdf] [code]
[arXiv 2023] Direct Preference Optimization: Your Language Model is Secretly a Reward Model. [pdf]
[NeurIPS 2022] Fine-tuning language models to find agreement among humans with diverse preferences. [pdf]
[arXiv 2022] Improving alignment of dialogue agents via targeted human judgements. [pdf]
[arXiv 2022] Teaching language models to support answers with verified quotes. [pdf] [data]
[NeurIPS 2020] Learning to summarize with human feedback. [pdf] [code]
[arXiv 2019] Fine-Tuning Language Models from Human Preferences. [pdf] [code]

Assessing LLM-Generated Annotations

Evaluating LLM-Generated Annotations

[EACL 2024] GPTs Are Multilingual Annotators for Sequence Generation Tasks. [pdf] [code]
[arXiv 2023] AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators. [pdf]
[arXiv 2023] Open-Source Large Language Models Outperform Crowd Workers and Approach ChatGPT in Text-Annotation Tasks. [pdf]
[NAACL 2022] LMTurk: Few-Shot Learners as Crowdsourcing Workers in a Language-Model-as-a-Service Framework. [pdf] [code]
[EMNLP 2022] Large Language Models are Few-Shot Clinical Information Extractors. [pdf] [data]
[arXiv 2022] Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor. [pdf] [code]
[arXiv 2020] The Turking Test: Can Language Models Understand Instructions? [pdf]

Data Selection via Active Learning

[EMNLP 2023] FreeAL: Towards Human-Free Active Learning in the Era of Large Language Models [pdf] [code]
[EMNLP 2023] Active Learning Principles for In-Context Learning with Large Language Models. [pdf]
[IUI 2023] ScatterShot: Interactive In-context Example Curation for Text Transformation. [pdf] [code]
[ICML 2023] Prefer to Classify: Improving Text Classifiers via Auxiliary Preference Learning. [pdf] [code]
[arXiv 2023] Large Language Models as Annotators: Enhancing Generalization of NLP Models at Minimal Cost. [pdf]
[arXiv 2022] Active learning helps pretrained models learn the intended task. [pdf] [code]
[EACL 2021] Active Learning for Sequence Tagging with Deep Pre-trained Models and Bayesian Uncertainty Estimates. [pdf]

Learning with LLM-Generated Annotations

Target Domain Inference: Direct Utilization of Annotations

[ECIR 2024] Large Language Models are Zero-Shot Rankers for Recommender Systems. [pdf] [code]
[arXiv 2023] Causal Reasoning and Large Language Models: Opening a New Frontier for Causality. [pdf]
[ACL 2022] An Information-theoretic Approach to Prompt Engineering Without Ground Truth Labels. [pdf] [code]
[TMLR 2022] Emergent Abilities of Large Language Models. [pdf]
[NeurIPS 2022] Large Language Models are Zero-Shot Reasoners. [pdf]
[arXiv 2022] Visual Classification via Description from Large Language Models. [pdf]
[PMLR 2021] Learning Transferable Visual Models From Natural Language Supervision. [pdf] [code]
[EMNLP 2019] Language Models as Knowledge Bases? [pdf] [code]

Knowledge Distillation: Bridging LLM and task-specific models

[EACL 2024] GPTs Are Multilingual Annotators for Sequence Generation Tasks. [pdf] [code]
[EMNLP 2023] Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents. [pdf] [code]
[ACL 2023] Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes. [pdf] [code]
[ACL 2023] GPT4All: Training an Assistant-style Chatbot with Large Scale Data Distillation from GPT-3.5-Turbo. [pdf] [code]
[ACL 2023] GKD: A General Knowledge Distillation Framework for Large-scale Pre-trained Language Model. [pdf] [code]
[EMNLP 2023] Lion: Adversarial Distillation of Proprietary Large Language Models. [pdf] [code]
[arXiv 2023] Specializing Smaller Language Models towards Multi-Step Reasoning. [pdf]
[arXiv 2023] Knowledge Distillation of Large Language Models. [pdf] [code]
[arXiv 2023] Distilling Large Language Models for Biomedical Knowledge Extraction: A Case Study on Adverse Drug Events. [pdf]
[arXiv 2023] Web Content Filtering through knowledge distillation of Large Language Models. [pdf]
[ICLR 2022] Knowledge Distillation of Large Language Models. [pdf] [code]
[arXiv 2022] Teaching Small Language Models to Reason. [pdf]

Harnessing LLM Annotations for Fine-Tuning and Prompting

In-Context Learning (ICL)

[EMNLP 2023] Active Learning Principles for In-Context Learning with Large Language Models. [pdf]
[ACL 2023] Synthetic Prompting: Generating Chain-of-Thought Demonstrations for Large Language Models. [pdf]
[ICLR 2022] Finetuned Language Models Are Zero-Shot Learners. [pdf] [code]
[ICLR 2022] Selective Annotation Makes Language Models Better Few-Shot Learners. [pdf] [code]
[NAACL 2022] Improving In-Context Few-Shot Learning via Self-Supervised Training. [pdf]
[arXiv 2022] Instruction Induction: From Few Examples to Natural Language Task Descriptions. [pdf] [code]
[NeurIPS 2020] Language Models are Few-Shot Learners. [pdf]

Chain-of-Thought Prompting (CoT)

[ICLR 2023] Automatic chain of thought prompting in large language models. [pdf] [code]
[ACL 2023] SCOTT: Self-Consistent Chain-of-Thought Distillation. [pdf]
[arXiv 2023] Specializing Smaller Language Models towards Multi-Step Reasoning. [pdf]
[NeurIPS 2022] Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. [pdf]
[NeurIPS 2022] Large Language Models are Zero-Shot Reasoners. [pdf]
[arXiv 2022] Rationale-augmented ensembles in language models. [pdf]
[ACL 2020] A Diverse Corpus for Evaluating and Developing English Math Word Problem Solvers. [pdf] [code]
[NAACL 2019] CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge. [pdf] [code]

Instruction Tuning (IT)

[ACL 2023] Crosslingual Generalization through Multitask Finetuning. [pdf] [code]
[ACL 2023] SELF-INSTRUCT: Aligning Language Models with Self-Generated Instructions. [pdf] [code]
[ACL 2023] Can Large Language Models Be an Alternative to Human Evaluations? [pdf]
[arXiv 2023] LLaMA: Open and Efficient Foundation Language Models. [pdf][code]
[arXiv 2022] Teaching language models to support answers with verified quotes. [pdf] [data]
[arXiv 2022] Scaling instruction-finetuned language models. [pdf] [code]
[EMNLP 2022] Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks. [pdf] [code]
[NeurIPS 2020] Language Models are Few-Shot Learners. [pdf]
Stanford alpaca: An instruction-following llama model. [HTML] [code]

Alignment Tuning (AT)

[PMLR 2023] Pretraining Language Models with Human Preferences. [pdf][code]
[ICLR 2023] Offline RL for Natural Language Generation with Implicit Language Q Learning. [pdf] [code]
[arXiv 2023] Chain of hindsight aligns language models with feedback. [pdf][code]
[arXiv 2023] GPT-4 Technical Report. [pdf]
[arXiv 2023] Llama 2: Open Foundation and Fine-Tuned Chat Models. [pdf] [code]
[arXiv 2023] RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback. [pdf]
[NeurIPS 2022] Training language models to follow instructions with human feedback. [pdf]
[arXiv 2022] Teaching language models to support answers with verified quotes. [pdf] [data]
[arXiv 2019] Fine-Tuning Language Models from Human Preferences. [pdf][code]
[arXiv 2019] CTRL: A Conditional Transformer Language Model for Controllable Generation. [pdf][code]
[NeurIPS 2017] Deep Reinforcement Learning from Human Preferences. [pdf]

Surveys

[ACM 2023] Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. [pdf]
[arXiv 2023] A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models. [pdf] [repo]
[arXiv 2022] A Survey of Large Language Models. [pdf] [repo]
[arXiv 2022] A Survey on In-context Learning. [pdf]
[arXiv 2022] A Comprehensive Survey on Instruction Following. [pdf] [repo]

Toolkits

LangChain: [HTML] [code]
Stack AI: [HTML]
UBIAI: [HTML]
Prodigy: [HTML]
Alfred: [pdf] [code]

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
figure		figure
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Large Language Models for Data Annotation: A Survey

LLM-Based Data Annotation

Manually Engineered Prompt

Alignment via Pairwise Feedback

Assessing LLM-Generated Annotations

Evaluating LLM-Generated Annotations

Data Selection via Active Learning

Learning with LLM-Generated Annotations

Target Domain Inference: Direct Utilization of Annotations

Knowledge Distillation: Bridging LLM and task-specific models

Harnessing LLM Annotations for Fine-Tuning and Prompting

In-Context Learning (ICL)

Chain-of-Thought Prompting (CoT)

Instruction Tuning (IT)

Alignment Tuning (AT)

Surveys

Toolkits

About

Releases

Packages

cal-a737/LLM4Annotation

Folders and files

Latest commit

History

Repository files navigation

Large Language Models for Data Annotation: A Survey

LLM-Based Data Annotation

Manually Engineered Prompt

Alignment via Pairwise Feedback

Assessing LLM-Generated Annotations

Evaluating LLM-Generated Annotations

Data Selection via Active Learning

Learning with LLM-Generated Annotations

Target Domain Inference: Direct Utilization of Annotations

Knowledge Distillation: Bridging LLM and task-specific models

Harnessing LLM Annotations for Fine-Tuning and Prompting

In-Context Learning (ICL)

Chain-of-Thought Prompting (CoT)

Instruction Tuning (IT)

Alignment Tuning (AT)

Surveys

Toolkits

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages