We provide multiple examples of using LightSeq to accelerate Hugging Face model training.
Before doing next training, you need to switch to the BERT directory:
cd examples/training/huggingface/bert
First you should install these requirements:
pip install torch ninja transformers seqeval datasets
Then you can easily fine-tunes BERT on different tasks by running the bash scripts task_ner/run_ner.sh
, task_glue/run_glue.sh
, task_qa/run_qa.sh
, etc.
You can also fine-tune the models using int8 mixed-precision by running task_ner/run_quant_ner.sh
.
Before doing next training, you need to switch to the GPT2 directory:
cd examples/training/huggingface/gpt
First you should install these requirements:
pip install -r requirements.txt
Then you can easily fine-tunes GPT2 by running the bash scripts run_clm.sh
.
You can also fine-tune the models using int8 mixed-precision by running run_quant_clm.sh
.
Before doing next training, you need to switch to the GPT2 directory:
cd examples/training/huggingface/bart/summarization
First you should install these requirements:
pip install -r requirements.txt
Then you can easily fine-tunes BART by running the bash scripts run_summarization.sh
.
Before doing next training, you need to switch to the ViT directory:
cd examples/training/huggingface/vit
First you should install these requirements:
pip install torch ninja transformers seqeval datasets
Then you can easily fine-tunes ViT by running the bash scripts run_vit.sh
.
You can also fine-tune the models using int8 mixed-precision by running run_quant_vit.sh
.
LightSeq support Hugging Face training using GCQ. Taking BERT as an example, first you need to switch to BERT directory you can easily fine-tunes BERT with GCQ on different tasks by running the bash scripts task_ner/run_gcq_ner.sh
, task_glue/run_gcq_glue.sh
, task_qa/run_gcq_qa.sh
, etc.
You can use --enable_GCQ
to enable GCQ in your multi-machine distributed training.
You can set --GCQ_quantile
to a float value between 0.0 and 1.0, which will use the quantile of gradient bucket as clip-max value when quantizing gradients. E.g., when setting --GCQ_quantile
0.99, the clip-max value is equal to the 0.99-th quantile of gradient bucket.
You can use multiple NICs in NCCL communication. E.g., if every machine has 4 NICs: eth0, eth1, eth2, eth3, you can use the following command.
export NCCL_SOCKET_IFNAME=eth0,eth1,eth2,eth3