Skip to content

The official implementation of Phased Instruction Fine-Tuning for Large Language Models,ACL 2024 Findings

License

Notifications You must be signed in to change notification settings

xubuvd/PhasedSFT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Phased Instruction Fine-Tuning for Large Language Models

This repository provides an overview of all components from the paper Phased Instruction Fine-Tuning for Large Language Models, ACL 2024 Findings.

Citation

@Inproceedings{PhasedSFT,
    author = {Wei Pang and Chuan Zhou and Xiao-Hua Zhou and Xiaojie Wang},
    title = {Phased Instruction Fine-Tuning for Large Language Models},
    booktitle = {ACL Findings},
    year = {2024},
    pages = {},
}

Code

bash

bash run.sh
bash stopall.sh

code dir

###1.generation dir: making inference on 'oasst' 'anthropic' 'koala' 'vicuna' 'sinstruct' 'wizardlm'

bash evaluation.sh

###2.evaluation dir: scoring with gpt-4-0613 and then calculating the Win-Rate metric

bash run_gpt4_scoring.sh
bash run_win_rate.sh

###3.scripts dir: training scripts

###4.xllm dir: traing codes and dataloader

Datasets

The instruction difficulty within the Alpaca and Alpaca-cleaned data are quantitatively assessed by GPT-4, assigning scores from 1 to 5, with higher score denoting increased complexity.

difficulty-stratified instruction dataset

Alpaca-scored: Alpaca 52k dataset scored by the strongest gpt-4-0613, then splited into three stages with difficult increasing.
Alpaca-clean-scored: Alpaca-clean 52k dataset scored by gpt-4-0613 too.

Summary of this paper

main

The above Figure is a summary of this paper. As can be seen from the figure, with the progression of uptraining, it demonstrates the winning rate growth trend (the solid line) of five LLMs on multi-stage sub-datasets with increasing difficulty. This forms a stark contrast to the winning rate trend of the same five LLMs on multi-stage sub-datasets with randomly distributed difficulty levels (the dotted-line).

Alpaca 52K scored by gpt-4-0613

alpaca_data-gpt-4-0613_v1-score-dist

Alpaca-clean 52K scored by gpt-4-0613

alpaca_data_cleaned-gpt-4-0613_v1-score-dist

About

The official implementation of Phased Instruction Fine-Tuning for Large Language Models,ACL 2024 Findings

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published