This repo contains the reproducibility information for the numbers listed in the SN-13B-8k-Instruct blogpost. Scrolls and ZeroScrolls refer to the following benchmarks:
- git clone https://github.com/EleutherAI/lm-evaluation-harness.git
- Checkout the commit of LM Evaluation Harness that we used to collect the results:
git checkout fe803c2920a85f6afb74ea05d1d2f98ec27f1a63`
- Follow the setup instructions specified in the repository's README.
- Add ZeroScrolls task code to the LM Evaluation Harness.
- This will involve importing the zero scrolls tasks in the
tasks/__init__.py
file in LM Evaluation Harness. You will need to add the following line to theTASK_REGISTRY
:
**zero_scrolls.construct_tasks(),
- This will involve importing the zero scrolls tasks in the
- Install requirements
pip install requirements.txt
- Run the following command in the LM Evaluation Harness:
python main.py --batch_size 1 --tasks zero_scrolls_gov_report,zero_scrolls_summ_screen_fd,zero_scrolls_qm_sum,zero_scrolls_squality,zero_scrolls_qasper,zero_scrolls_narrative_qa,zero_scrolls_quality,zero_scrolls_musique,zero_scrolls_space_digest,zero_scrolls_book_sum_sort --model gpt2 --model_args pretrained=sambanovasystems/SN-13B-8k-Instruct,dtype=float16 --num_fewshot 0 --no_cache
- In the LM Evaluation Harness, open
tasks/scrolls.py
and replace the'\n'
with your model's end of text token in theuntil
list for allgreedy_until
requests. - Run the following command in the LM Evaluation Harness:
python main.py --batch_size 1 --tasks scrolls_govreport,scrolls_qmsum,scrolls_quality,scrolls_summscreenfd --model gpt2 --model_args pretrained=sambanovasystems/SN-13B-8k-Instruct,dtype=float16 --num_fewshot 0 --no_cache