Skip to content

wiesenthal/raft-submission

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

What is RAFT

RAFT is a few-shot classification benchmark that tests language models:

  • across multiple domains (lit review, tweets, customer interaction, etc.)
  • on economically valuable classification tasks (someone inherently cares about the task)
  • in a setting that mirrors deployment (50 examples per task, info retrieval allowed, hidden test set)

My Approach

I implemented a modified version of SetFit that uses RandomForestClassifier as its model head (which slightly improved cross-validation performance) along with T-Few models I trained. I compared the predictions from the modified SetFit and the T-Few models and

  • if they agreed: I used the prediction as is
  • if they disagreed: I used my GPT classifier implementation with GPT-4 as the tiebreaker (using in-context learning with the message history created out of the training examples)

This approach essentially takes a vote between the three classifiers which I found to significantly improve performance in cross-validation. Plus, it saves me money in openAI credits. my raft drawio


~~~ Raft Baselines README ~~~

Setup

This is the repository for the GPT-3 baselines described in the RAFT benchmark paper.

Set up a virtual environament and install necessary requirements from the requirements file.

conda create -n raft-baselines python=3.8 && conda activate raft-baselines
pip install -e '.[dev]'

Install raft-baselines.

python setup.py develop

You may have to run the above command with sudo prepended for permissions.

Starter Kit

A starter kit notebook walks through the basics of making predictions using models from the HuggingFace Model Hub. There's also a Colab version.

RAFT Predict

Use the raft_predict script to run classifiers on the RAFT datasets. By default, the script will run on the first 5 test examples for each dataset. To use a random classifier on the first 10 examples from the ADE Corpus V2 dataset:

python -m raft_baselines.scripts.raft_predict with n_test=10 'configs=["ade_corpus_v2"]' classifier_name=RandomClassifier

The other classifiers available are:

  • GPT3Classifier: the one used for the GPT-3 baseline in the paper
  • TransformersCausalLMClassifier: takes as input a model_type string, and runs an arbitrary CausalLM from the HuggingFace Model Hub

For example, to generate predictions from DistilGPT-2 on the first 10 examples of the ADE Corpus you can run:

python -m raft_baselines.scripts.raft_predict with n_test=10 'configs=["ade_corpus_v2"]' classifier_name=TransformersCausalLMClassifier 'classifier_kwargs={"model_type":"distilgpt2"}'

In order to run experiments with GPT-3, you will need to have an OpenAI API key. Create a file called .env and put your API key there. Copy the format of .env-example:

echo OPENAI_API_KEY=$OPENAI_API_KEY > .env

Sacred

We use Sacred to track our experiments and outputs. This has no overhead at runtime, simply run either of our two experiment scripts with python like normal. You can change where tracking files get saved to by modifying the observer at the top of every experiment file, or you can change the details of the experiment via the various configuration parameters specified in the configs block.

# For labeling the test set
python -m raft_baselines.scripts.raft_predict
# For tuning various dimensions on the train set with LOO validation
python -m raft_baselines.scripts.raft_train_experiment

Alternately, you can modify the input variables to an experiment from the command line, as is done in the example above. Regardless, some modification will be necessary if you want to run different experiments. See this tutorial for more information.

Similarly, you can save metrics with raft_experiment.log_scalar(), or by using the sacred observer directly. See this tutorial for more information.

To save out predictions and upload to the HuggingFace Hub (and the leaderboard), see the RAFT submission template.

License

This repository is licensed under the MIT License.

About

Ranked #1 on the RAFT leaderboard as of 12/12/2023

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published