Code for the NeurIPS 2022 paper "Decoupling Knowledge from Memorization: Retrieval-augmented Prompt Learning".
The architecture of our model can be seen as follows:
RETROPROMPT is a simple and general retrieval-augmented framework for prompt learning, whose basis is the dense retriever with an open-book knowledge-store to decouple knowledge from memorization. RETROPROMPT consists of three components: retrieval of neural demonstration for enhancing input, the kNN guided training and the kNN-based probability for cloze-style prediction.
📋 Note: There are two main file folders in our project. The folder
GLUE_task
includes three single sentence tasks (SST-2, MR, CR), three sentence pair classification tasks (MNLI, QNLI, QQP) and one information extraction task (Few-NERD), and the folderRE_task
includes two information extraction tasks (SemEval, TACRED).
Step1 Download the basic code
git clone --depth 1 https://github.com/zjunlp/PromptKG.git
Step2 Create a virtual environment using Anaconda
and enter it.
conda create -n retroprompt python=3.8
conda activate retroprompt
Step3 Enter the task directory
cd PrompKG/research/RetroPrompt
pip install -r requirements.txt
The environmental requirements are placed in GLUE_task
and RE_task
respectively.
There are some differences between the environmental requirements of GLUE task and RE task:
- The version of
transformers
in the GLUE task is 4.11.3, while the version oftransformers
in the RE task is 4.7. - GLUE task based on the
transformers
framework from huggingface , RE task based on thepytorch_lightning
framework.
Download the original GLUE datasets (SST-2, MR, CR, MNLI, QNLI, QQP, RTE, MPQA) from here. We take k=16
or k=4
and 5 different seeds includes 13, 21, 42, 87, 100
in few-shot learning. You can run the following command to generate the few-shot data of GLUE tasks. Then the generated data will be placed in data/training_data/k_shot
:
# take SST-2 and 16-shot as example
cd GLUE_task
python tools/generate_k_shot_data.py --k 16 --task SST-2
The original Few-NERD dataset can be downloaded from here.
Using the command below to get the answer words to use in the training.
cd RE_task
python get_label_word.py --model_name_or_path roberta-large-uncased --dataset_name semeval
After that, the {answer_words}.pt
will be saved in the dataset, and you need to assign the model_name_or_path and dataset_name in the get_label_word.py.
In the few-shot scenario, we take k=16
or k=4
and take 5 different seeds include 1, 2, 3
. The few-shot data will be generated to dataset/task_name/k-shot
, moreover, you need to copy the validation data, test data and relation data to few-shot data path.
cd RE_task
python generate_k_shot.py --data_dir ./dataset --k 16 --dataset semeval
cd dataset
cd semeval
cp rel2id.json val.txt test.txt ./k-shot/16-1
The running scipts are placed in GLUE_task/scripts
. Run folloing command to run glue tasks:
cd GLUE_task
bash scripts/run/run_glue.sh
And run this command to interpolate knn based prediciton:
bash scripts/knn/run_glue_knn_infer.sh
The running scipts are placed in RE_task/scripts
.
Run following command to run glue tasks with knn guided training:
cd RE_task
bash scripts/semeval.sh
We further explain some important arguments in two tasks:
use_demo
: Whether use neural demonstration.demo_topk
: Number of retrieved nearest neighbors for aggregation to generate the neural demonstration. Default is 8 in 16 shot and 2 in 4 shot.train_with_knn
: Whether apply KNN retrieve for guiding training.only_train_knn
: Whether leverage KNN probability for cloze-style prediction.demo_topk
: Number of neighbors in neural demonstration acquiring.knn_topk
: Number of retrieved nearest neighbors in KNN guided training and KNN incorporated prediction.knn_lambda
: The weight of KNN probability to produce the final probability.beta
: The scalar to determine the proportion of each loss term in KNN guided training.
If you use or extend our work, please cite the paper as follows:
@article{DBLP:journals/corr/abs-2205-14704,
author = {Xiang Chen and
Lei Li and
Ningyu Zhang and
Xiaozhuan Liang and
Shumin Deng and
Chuanqi Tan and
Fei Huang and
Luo Si and
Huajun Chen},
title = {Decoupling Knowledge from Memorization: Retrieval-augmented Prompt
Learning},
journal = {CoRR},
volume = {abs/2205.14704},
year = {2022},
url = {https://doi.org/10.48550/arXiv.2205.14704},
doi = {10.48550/arXiv.2205.14704},
eprinttype = {arXiv},
eprint = {2205.14704},
timestamp = {Tue, 14 Jun 2022 15:20:45 +0200},
biburl = {https://dblp.org/rec/journals/corr/abs-2205-14704.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}