Name		Name	Last commit message	Last commit date
parent directory ..
fairseq		fairseq
icl_ft		icl_ft
scripts		scripts
struprompting		struprompting
LICENSE		LICENSE
README.md		README.md
analyze.sh		analyze.sh
run.sh		run.sh
validate.py		validate.py

README.md

Why Can GPT Learn In-Context?

This repository contains the implementation of ACL 2023 Findings paper "Why Can GPT Learn In-Context? Language Models Implicitly Perform Gradient Descent as Meta-Optimizers".

Installation

We recommend you to run the code using the docker under Linux:

docker run -it --rm--runtime=nvidia --ipc=host --privileged damaidai/icl:v1 bash

Then install the following packages with pip:

pip install --user datasets=2.4.0
pip install --user tensorboard scikit-learn
pip install --user jsonlines
pip install --user -e fairseq/

Downloading Models

We use Fairseq-LM (dense models) in our experiments. You can download different sized model and the dictionary though the above link. For tokenizer, you can download ENCODER and BPE_VOCAB.

Analyzing Models

Step 1: Producing the recording information

bash run.sh ${model_name} ${model_arch} ${task} ${k} ${seed} ${perm_id} ${output_path} ${base_dir} ${lr}

model_name: Specify the model to analyze, e.g., en_dense_lm_1_3b.
model_arch: Each model corresponds to a specific architecture: gptmodel_large (1.3B), gptmodel_xl (2.7B)
task: Evaluation dataset. Could be ["cb", "sst2", "sst5", "subj", "mr", "agnews"].
k: The number of demonstration examples (k-shot).
seed: Random seed. Please refer to our paper to set the seed for each setting.
perm_id: Permutation method for demonstration examples. In our experiments, it is always set to 0.
output_path: The output path.
base_dir: Base directory for experiments.
lr: Learning rate used in the finetuning setting. Please refer to our paper to set it.

In run.sh, based on base_dir,

bpe_path=$base_dir/gpt_icl/vocab.bpe: The vocabulary file of BPE.
encoder_path=$base_dir/gpt_icl/encoder.json: The encoder file of BPE.
dict_path=$base_dir/gpt_icl/$model_name/dict.txt: The dictionary file.
ana_rlt_dir=$base_dir/ana_rlt/$model_name/$task: The directory to save analysis results.
model_path=$base_dir/gpt_icl/$model_name/model.pt: Model file.
save_dir=$base_dir/ft_gpt/$task/$model_name/$lr: Path to save the finetuned model in the finetuning setting.

You should put the downloaded bpe file, encoder file, dictionary file, and model checkpoint in the corresponding path, and then run run.sh. When run.sh is done, $ana_rlt_dir/$ana_setting/record_info.jsonl will contain the recording information for analysis, where $ana_setting can be ["ftzs", "zs", "icl"]. They are corresponding to the finetuning, zero-shot, and ICL setting, respectively.

Step 2: Computing the analysis results

When the recording information for all models and tasks are produced, run analyze.sh to compute the analysis results in this paper.

bash analyze.sh

The results will be saved at $base_dir/ana_rlt/rlt_json/$task-$model.json or $base_dir/ana_rlt/rlt_json/$task-$model_training_attn.json

Citation

Coming soon.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

understand_icl

understand_icl

README.md

Why Can GPT Learn In-Context?

Installation

Downloading Models

Analyzing Models

Step 1: Producing the recording information

Step 2: Computing the analysis results

Citation

Files

understand_icl

Directory actions

More options

Directory actions

More options

Latest commit

History

understand_icl

Folders and files

parent directory

README.md

Why Can GPT Learn In-Context?

Installation

Downloading Models

Analyzing Models

Step 1: Producing the recording information

Step 2: Computing the analysis results

Citation