Skip to content

Latest commit

 

History

History
54 lines (33 loc) · 2.42 KB

README.md

File metadata and controls

54 lines (33 loc) · 2.42 KB

Introduction

WIP: This codebase is under active development This code is a simplified and refactored version of the original code of the paper Discovering Latent Knowledge in the zip file linked in https://github.com/collin-burns/discovering_latent_knowledge

Dependencies

See requirements.txt file

Our code uses PyTorch and Huggingface Transformers. You will also need to install promptsouce, a toolkit for NLP prompts. We tested our code on Python 3.9.

Quick Start

First install the package with pip install -e . in the root directory, or pip install -e .[dev] if you'd like to contribute to the project (see Development section below). This should install all the necessary dependencies.

For a quick test: You can look into and run generate.sh and evaluate.sh (Warning: They are in the package elk itself right now. Will be changed):

cd elk
sh generate.sh
sh train_evaluate.sh

Furthermore:

  1. To generate the hidden states for one model mdl and all datasets, cd elk and then run
python generation_main.py --model deberta-v2-xxlarge-mnli --datasets imdb --prefix normal --device cuda --num_data 1000

To test deberta-v2-xxlarge-mnli with the misleading prefix, and only the imdb and amazon-polarity datasets, while printing extra information, run:

The name of prefix can be found in ./utils_generation/construct_prompts.py. This command will save hidden states to generation_results and will save zero-shot accuracy to generation_results/generation_results.csv.

  1. To train a ccs model and a logistic regression model
python train.py --model deberta-v2-xxlarge-mnli --prefix normal --dataset imdb --num_data 1000

and evaluate:

python evaluate.py --model deberta-v2-xxlarge-mnli --dataset imdb --num_data 1000

Once finished, results will be saved in evaluation_results/{model}_{prefix}_{seed}.csv

Development

Use pip install pre-commit && pre-commit install in the root folder before your first commit.

If you work on a new feature / fix or some other code task, make sure to create an issue and assign it to yourself (Maybe, even share it in the elk channel of Eleuther's Discord with a small note) In this way, others know you are working on the issue and we won't do the same thing twice 👍 Also they can contact you easily.