Skip to content

ruyue0001/Retrieval-Augmented-Adaptation

Repository files navigation

Retrieval Augmented Domain Adaptation

Code for EMNLP 2023 paper Adapt in Contexts: Retrieval-Augmented Domain Adaptation via In-Context Learning (Arxiv).

We propose to retrieve similar examples from the target unlabeled corpus to serve as the context of a source query and perform adaptive in-context learning by concatenating the source query and target contexts as the input prompt. And we propose a domain-adaptive in-context learning (DAICL) framework for different LM architectures, including encoder-only and decoder-only models.

Dataset

NER:

SA:

  • Amazon Benchmark 2-classes Data
  • Amazon Review 3-classes Data
  • You can also download our processed SA data Here

Retriever

SimCSE

BertScore

python NER_Datasets/retrieval_cross_domain.py

Run

Train LLaMA:

CUDA_VISIBLE_DEVICES=0 python finetune_new.py \
    --base_model 'yahma/llama-7b-hf' \
    --data_path 'NER_Datasets/llama_train_data/gold_demo/wnut16_gold_demo.json' \
    --output_dir 'model/ner_conll03-wnut16_gold_demo_lr3e-4_r16_alpha16_toi_aet_0' \
    --batch_size 256 \
    --micro_batch_size 4 \
    --num_epochs 5 \
    --learning_rate 3e-4 \
    --cutoff_len 512 \
    --val_set_size 1000 \
    --warmup_steps 20 \
    --logging_steps 4\
    --eval_steps 50 \
    --save_steps 50 \
    --lora_r 16 \
    --lora_alpha 16 \
    --lora_dropout 0.05 \
    --lora_target_modules '[q_proj,k_proj,v_proj,o_proj]' \
    --train_on_inputs \
    --add_eos_token

LLaMA Inference:

CUDA_VISIBLE_DEVICES=1 python eval_generate.py \
    --load_8bit \
    --base_model "yahma/llama-7b-hf" \
    --lora_weights 'model/ner_conll03-wnut16_gold_demo_lr3e-4_r16_alpha16_toi_aet_0' \
    --eval_path  "NER_Datasets/llama_inf_data/gold_demo//wnut16_gold_demo.json" \
    --eval_result_path "NER_Datasets/llama_inf_data/eval_result/ner_conll03-wnut16_gold_demo_lr3e-4_r16_alpha16_toi_aet_0/wnut16_gold_demo.txt" \
    --eval_batch_size 3

Run Roberta NER

CUDA_VISIBLE_DEVICES=0 python roberta_ner/train.py \
    --config config/conll03-wnut16_cl_kl.yaml

Citing

Please cite the following paper if you found the resources in this repository useful.

@inproceedings{long-etal-2023-adapt,
    title = "Adapt in Contexts: Retrieval-Augmented Domain Adaptation via In-Context Learning",
    author = "Long, Quanyu  and
      Wang, Wenya  and
      Pan, Sinno",
    editor = "Bouamor, Houda  and
      Pino, Juan  and
      Bali, Kalika",
    booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing",
    month = dec,
    year = "2023",
    address = "Singapore",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.emnlp-main.402",
    pages = "6525--6542",
}

Acknowledgement

This project is implemented based on alpaca-lora and CLNER source code

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published