This repository provides an overview of all components from the paper Phased Instruction Fine-Tuning for Large Language Models, ACL 2024 Findings.
@Inproceedings{PhasedSFT,
author = {Wei Pang and Chuan Zhou and Xiao-Hua Zhou and Xiaojie Wang},
title = {Phased Instruction Fine-Tuning for Large Language Models},
booktitle = {ACL Findings},
year = {2024},
pages = {},
}
bash run.sh
bash stopall.sh
###1.generation dir: making inference on 'oasst' 'anthropic' 'koala' 'vicuna' 'sinstruct' 'wizardlm'
bash evaluation.sh
###2.evaluation dir: scoring with gpt-4-0613 and then calculating the Win-Rate metric
bash run_gpt4_scoring.sh
bash run_win_rate.sh
###3.scripts dir: training scripts
###4.xllm dir: traing codes and dataloader
The instruction difficulty within the Alpaca and Alpaca-cleaned data are quantitatively assessed by GPT-4, assigning scores from 1 to 5, with higher score denoting increased complexity.
Alpaca-scored: Alpaca 52k dataset scored by the strongest gpt-4-0613, then splited into three stages with difficult increasing.
Alpaca-clean-scored: Alpaca-clean 52k dataset scored by gpt-4-0613 too.
The above Figure is a summary of this paper. As can be seen from the figure, with the progression of uptraining, it demonstrates the winning rate growth trend (the solid line) of five LLMs on multi-stage sub-datasets with increasing difficulty. This forms a stark contrast to the winning rate trend of the same five LLMs on multi-stage sub-datasets with randomly distributed difficulty levels (the dotted-line).