Skip to content

Paper list of "The Life Cycle of Knowledge in Big Language Models: A Survey"

Notifications You must be signed in to change notification settings

c-box/KnowledgeLifecycle

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 

Repository files navigation

KnowledgeLifecycle

This is the paper list of survey paper "The Life Cycle of Knowledge in Big Language Models: A Survey". We will update the survey content and this paper list regularly, and we very much welcome suggestions of any kind.

We present a tutorial on CCKS 2023, here is the latest slides of our tutorial.

image

Abstract

Knowledge plays a critical role in artificial intelligence. Recently, the extensive success of pre-trained language models (PLMs) has raised significant attention about how knowledge can be acquired, maintained, updated and used by language models. Despite the enormous amount of related studies, there still lacks a unified view of how knowledge circulates within language models throughout the learning, tuning, and application processes, which may prevent us from further understanding the connections between current progress or realizing existing limitations. In this survey, we revisit PLMs as knowledge-based systems by dividing the life circle of knowledge in PLMs into five critical periods, and investigating how knowledge circulates when it is built, maintained and used. To this end, we systematically review existing studies of each period of the knowledge life cycle, summarize the main challenges and current limitations, and discuss future directions.

1. Knowledge Acquisition

1.1. Learning from Text Data

Language models are unsupervised multitask learners. (OpenAI blog 2019) [paper]

Language Models are Few-Shot Learners. (NIPS 2020) [paper]

Training language models to follow instructions with human feedback. (2022) [paper]

Bloom: A 176b-parameter open-access multilingual language model. (2022) [paper]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. (NAACL 2019) [paper]

Roberta: A robustly optimized bert pretraining approach. (2019) [paper]

Exploring the limits of transfer learning with a unified text-to-text transformer. (JMLR 2020)[paper]

MASS: Masked Sequence to Sequence Pre-training for Language Generation. (Proc. of ICML 2019) [paper]

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. (ACL 2020) [paper]

Pretrained Language Model Embryology: The Birth of ALBERT. (EMNLP 2020) [paper]

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. (ICLR 2020) [paper]

How much pretraining data do language models need to learn syntax?. (EMNLP 2021) [paper]

Probing Across Time: What Does RoBERTa Know and When?. (Findings of ACL 2021) [paper]

1.2. Learning from Structured Data

Ernie: Enhanced representation through knowledge integration. (2019) [paper]

Exploiting Structured Knowledge in Text via Graph-Guided Representation Learning. (EMNLP 2020) [paper]

Pretrained Encyclopedia: Weakly Supervised Knowledge-Pretrained Language Model. (ICLR 2020) [paper]

LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention. (EMNLP 2020) [paper]

Entities as Experts: Sparse Memory Access with Entity Supervision. (EMNLP 2020) [paper]

Zero-Shot Entity Linking by Reading Entity Descriptions. (ACL 2019) [paper]

Learning Dense Representations for Entity Retrieval. (CoNLL 2019) [paper]

Knowledge Enhanced Contextual Word Representations. (EMNLP 2019) [paper]

LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention. (EMNLP 2020) [paper]

ERNIE: Enhanced Language Representation with Informative Entities. (ACL 2019) [paper]

KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation. (TACL 2021) [paper]

K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters. (Findings of ACL 2021) [paper]

ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning. (ACL 2021) [paper]

Self-Supervised Knowledge Triplet Learning for Zero-Shot Question Answering. (EMNLP 2020) [paper]

K-BERT: Enabling Language Representation with Knowledge Graph. (AAAI 2020) [paper]

Matching the Blanks: Distributional Similarity for Relation Learning. (ACL 2019) [paper]

COMET: Commonsense Transformers for Automatic Knowledge Graph Construction. (ACL 2019) [paper]

A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation. (TACL 2020) [paper]

Unsupervised Commonsense Question Answering with Self-Talk. (EMNLP 2020) [paper]

Align, mask and select: A simple method for incorporating commonsense knowledge into language representation models. (2019) [paper]

Knowledge-driven data construction for zero-shot evaluation in commonsense question answering. (AAAI 2021)

SentiLARE: Sentiment-Aware Language Representation Learning with Linguistic Knowledge. (EMNLP 2020) [paper]

SKEP: Sentiment Knowledge Enhanced Pre-training for Sentiment Analysis. (ACL 2020) [paper]

Specializing Unsupervised Pretraining Models for Word-Level Semantic Similarity. (COLING 2020) [paper]

SenseBERT: Driving Some Sense into BERT. (ACL 2020) [paper]

Improving Semantic Matching through Dependency-Enhanced Pre-trained Model with Adaptive Fusion. (2022) [paper]

Limit-bert: Linguistic informed multi-task bert. (2019) [paper]

Do Syntax Trees Help Pre-trained Transformers Extract Information?. (EACL 2021) [paper]

Syntax-BERT: Improving Pre-trained Transformers with Syntax Trees. (EACL 2021) [paper]

2. Knowledge Representation

2.1. Gradient-based

Knowledge Neurons in Pretrained Transformers. (ACL 2022) [paper]

Transformer Feed-Forward Layers Are Key-Value Memories. (EMNLP 2021) [paper]

2.2. Causal-inspired

Locating and editing factual knowledge in gpt. (2022) [paper]

2.3. Attention-based

What Does BERT Look at? An Analysis of BERT's Attention. (ACL Workshop BlackboxNLP 2019) [paper]

Do Attention Heads in BERT Track Syntactic Dependencies?. (2019) [paper]

Open Sesame: Getting inside BERT's Linguistic Knowledge. (ACL Workshop BlackboxNLP 2019) [paper]

2.4. Layer-wise

Open Sesame: Getting inside BERT's Linguistic Knowledge. (ACL Workshop BlackboxNLP 2019) [paper]

Linguistic Knowledge and Transferability of Contextual Representations. (NAACL 2019) [paper]

BERTnesia: Investigating the capture and forgetting of knowledge in BERT. (BlackboxNLP Workshop 2020) [paper]

Finding patterns in Knowledge Attribution for Transformers. (2022) [paper]

3. Knowledge Probing

3.1 Prompt-based Probing

Language Models as Knowledge Bases?. (EMNLP 2019) [paper]

Evaluating Commonsense in Pre-Trained Language Models. (AAAI 2020) [paper]

What BERT Is Not: Lessons from a New Suite of Psycholinguistic Diagnostics for Language Models. (TACL 2020) [paper]

oLMpics-On What Language Model Pre-training Captures. (TACL 2020) [paper]

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models. (2022) [paper]

How Can We Know What Language Models Know?. (TACL 2020) [paper]

Commonsense Knowledge Mining from Pretrained Models. (EMNLP 2019) [paper]

BERTese: Learning to Speak to BERT. (EACL 2021) [paper]

AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts. (EMNLP 2020) [paper]

Factual Probing Is [MASK]: Learning vs. Learning to Recall. (Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2021) [paper]

Prefix-Tuning: Optimizing Continuous Prompts for Generation. (ACL 2021) [paper]

GPT Understands, Too. (2021) [paper]

Measuring and Improving Consistency in Pretrained Language Models. (TACL 2021) [paper]

Can Prompt Probe Pretrained Language Models? Understanding the Invisible Risks from a Causal View. (ACL 2022) [paper]

Negated and Misprimed Probes for Pretrained Language Models: Birds Can Talk, But Cannot Fly. (ACL 2020) [paper]

Can Large Language Models Truly Understand Prompts? A Case Study with Negated Prompts. (2022) [paper]

Negated and Misprimed Probes for Pretrained Language Models: Birds Can Talk, But Cannot Fly. (ACL 2020) [paper]

E-BERT: Efficient-Yet-Effective Entity Embeddings for BERT. (Findings of ACL 2020) [paper]

Factual Probing Is [MASK]: Learning vs. Learning to Recall. (Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2021) [paper]

Knowledgeable or Educated Guess? Revisiting Language Models as Knowledge Bases. (ACL 2021) [paper]

How Pre-trained Language Models Capture Factual Knowledge? A Causal-Inspired Analysis. (Findings of ACL 2022) [paper]

Measuring Causal Effects of Data Statistics on Language Models' Factual Predictions. (2022) [paper]

Causal Analysis of Syntactic Agreement Mechanisms in Neural Language Models. (ACL 2021) [paper]

Can Prompt Probe Pretrained Language Models? Understanding the Invisible Risks from a Causal View. (ACL 2022) [paper]

3.2 Feature-based Probing

Probing Classifiers: Promises, Shortcomings, and Advances. (Computational Linguistics 2022) [paper]

What's in an Embedding? Analyzing Word Embeddings through Multilingual Evaluation. (EMNLP 2015) [paper]

Distributional vectors encode referential attributes. (EMNLP 2015) [paper]

Open Sesame: Getting inside BERT's Linguistic Knowledge. (ACL Workshop BlackboxNLP 2019) [paper]

What do you learn from context? Probing for sentence structure in contextualized word representations. (ICLR 2019) [paper]

What Does BERT Look at? An Analysis of BERT's Attention. (ACL Workshop BlackboxNLP 2019) [paper]

Linguistic Knowledge and Transferability of Contextual Representations. (NAACL 2019) [paper]

A Structural Probe for Finding Syntax in Word Representations. (NAACL 2019) [paper]

Do NLP Models Know Numbers? Probing Numeracy in Embeddings. (EMNLP 2019) [paper]

Probing for Semantic Classes: Diagnosing the Meaning Content of Word Embeddings. (ACL 2019) [paper]

Perturbed Masking: Parameter-free Probing for Analyzing and Interpreting BERT. (ACL 2020) [paper]

DirectProbe: Studying Representations without Classifiers. (NAACL 2021) [paper]

What Does BERT Look at? An Analysis of BERT's Attention. (ACL Workshop BlackboxNLP 2019) [paper]

A Primer in BERTology: What We Know About How BERT Works. (TACL 2020) [paper]

A Structural Probe for Finding Syntax in Word Representations. (NAACL 2019) [paper]

4. Knowledge Editing

4.1 Constrained Fine-tuning

Modifying memories in transformer models. (2020) [paper]

4.2 Memory-based

Memory-Based Model Editing at Scale. (ICML 2022)

Calibrating Factual Knowledge in Pretrained Language Models. (2022) [paper]

Memory-assisted prompt editing to improve GPT-3 after deployment. (2022) [paper]

4.3 Meta-Learning

Editable Neural Networks. (ICLR 2020) [paper]

HyperNetworks. (ICLR 2017) [paper]

Editing Factual Knowledge in Language Models. (EMNLP 2021) [paper]

Do language models have beliefs? methods for detecting, updating, and visualizing model beliefs. (2021) [paper]

Fast model editing at scale. (2021) [paper]

4.4 Locate and Edit

Knowledge Neurons in Pretrained Transformers. (ACL 2022) [paper]

Locating and editing factual knowledge in gpt. (2022) [paper]

5. Knowledge Application

5.1 Language Models as Knowledge Bases

Language models as or for knowledge bases. (2021) [paper]

Language Models as Knowledge Bases?. (EMNLP 2019) [paper]

Language Models as Knowledge Bases: On Entity Representations, Storage Capacity, and Paraphrased Queries. (EACL 2021) [paper]

How Can We Know What Language Models Know?. (TACL 2020) [paper]

Language models are open knowledge graphs. (2020) [paper]

Knowledgeable or Educated Guess? Revisiting Language Models as Knowledge Bases. (ACL 2021) [paper]

Language models as or for knowledge bases. (2021) [paper]

A review on language models as knowledge bases. (2022) [paper]

5.2 Language Models for Downstream Tasks

5.2.1. Fine-tuning

Emergent linguistic structure in artificial neural networks trained by self-supervision. (NAS 2020)

Knowledge enhanced pretrained language models: A compreshensive survey. (2021) [paper]

A survey of knowledge enhanced pre-trained models. (2021) [paper]

A survey of knowledge-intensive nlp with pre-trained language models. (2022) [paper]

Ernie: Enhanced representation through knowledge integration. (2019) [paper]

Exploiting Structured Knowledge in Text via Graph-Guided Representation Learning. (EMNLP 2020) [paper]

ERNIE: Enhanced Language Representation with Informative Entities. (ACL 2019) [paper]

KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation. (TACL 2021) [paper]

K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters. (Findings of ACL 2021) [paper]

K-BERT: Enabling Language Representation with Knowledge Graph. (AAAI 2020) [paper]

SenseBERT: Driving Some Sense into BERT. (ACL 2020) [paper]

Do Syntax Trees Help Pre-trained Transformers Extract Information?. (EACL 2021) [paper]

Syntax-BERT: Improving Pre-trained Transformers with Syntax Trees. (EACL 2021) [paper]

5.2.2. Prompt-tuning

Language models are unsupervised multitask learners. (OpenAI blog 2019)

Language Models are Few-Shot Learners. (NIPS 2020) [paper]

Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. (2021) [paper]

Finetuned language models are zero-shot learners. (2021) [paper]

Multitask prompted training enables zero-shot task generalization. (2021) [paper]

Training language models to follow instructions with human feedback. (2022) [paper]

Scaling instruction-finetuned language models. (2022) [paper]

How Can We Know What Language Models Know?. (TACL 2020) [paper]

BERTese: Learning to Speak to BERT. (EACL 2021) [paper]

AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts. (EMNLP 2020) [paper]

Making Pre-trained Language Models Better Few-shot Learners. (ACL 2021) [paper]

Knowledgeable Prompt-tuning: Incorporating Knowledge into Prompt Verbalizer for Text Classification. (ACL 2022) [paper]

Prefix-Tuning: Optimizing Continuous Prompts for Generation. (ACL 2021) [paper]

GPT Understands, Too. (2021) [paper]

WARP: Word-level Adversarial ReProgramming. (ACL 2021) [paper]

The Power of Scale for Parameter-Efficient Prompt Tuning. (EMNLP 2021) [paper]

How Can We Know What Language Models Know?. (TACL 2020) [paper]

Learning How to Ask: Querying LMs with Mixtures of Soft Prompts. (NAACL 2021) [paper]

Ptr: Prompt tuning with rules for text classification. (2021) [paper]

ThinkSum: Probabilistic reasoning over sets using large language models. (2022) [paper]

Exploiting Cloze-Questions for Few-Shot Text Classification and Natural Language Inference. (EACL 2021) [paper]

Pada: A prompt-based autoregressive approach for adaptation to unseen domains. (2021) [paper]

Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP. (TACL 2021) [paper]

5.2.3. In-context Learning

A Survey for In-context Learning. (2023) [paper]

Language Models are Few-Shot Learners. (NIPS 2020) [paper]

Calibrate Before Use: Improving Few-shot Performance of Language Models. (Proc. of ICML 2021) [paper]

Good Examples Make A Faster Learner: Simple Demonstration-based Learning for Low-resource NER. (ACL 2022) [paper]

Honest Students from Untrusted Teachers: Learning an Interpretable Question-Answering Pipeline from a Pretrained Language Model. (2022) [paper]

Robustness of Demonstration-based Learning Under Limited Data Scenario. (2022) [paper]

Explanations from Large Language Models Make Small Reasoners Better. (2022) [paper]

Promptagator: Few-shot Dense Retrieval From 8 Examples. (2022) [paper]

Generate rather than retrieve: Large language models are strong context generators. (2022) [paper]

Chain of thought prompting elicits reasoning in large language models. (2022) [paper]

Can language models learn from explanations in context?. (2022) [paper]

Least-to-Most Prompting Enables Complex Reasoning in Large Language Models. (2022) [paper]

Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity. (ACL 2022) [paper]

What Makes Good In-Context Examples for GPT-3?. (Proceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures 2022) [paper]

Making Pre-trained Language Models Better Few-shot Learners. (ACL 2021) [paper]

Learning To Retrieve Prompts for In-Context Learning. (NAACL 2022) [paper]

Selective Annotation Makes Language Models Better Few-Shot Learners. (2022) [paper]

Prompt programming for large language models: Beyond the few-shot paradigm. (Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems 2021)

Knowledgeable or Educated Guess? Revisiting Language Models as Knowledge Bases. (ACL 2021) [paper]

Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?. (2022) [paper]

Data Distributional Properties Drive Emergent In-Context Learning in Transformers. (CoRR 2022) [paper]

Transformers learn in-context by gradient descent. (2022) [paper]

Reference

If this repository helps you, please kindly cite the following bibtext:

@article{cao2023life,
  title={The Life Cycle of Knowledge in Big Language Models: A Survey},
  author={Cao, Boxi and Lin, Hongyu and Han, Xianpei and Sun, Le},
  journal={arXiv preprint arXiv:2303.07616},
  year={2023}
}

About

Paper list of "The Life Cycle of Knowledge in Big Language Models: A Survey"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages