forked from modelscope/swift
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge commit '4f6007e12597e14e619548c1eef1346a9792a7f7' into feat/course
* commit '4f6007e12597e14e619548c1eef1346a9792a7f7': (22 commits) fix freeze parameters bug (modelscope#325) Add courses (modelscope#324) fix ui bugs (modelscope#322) fix vllm env bug (modelscope#321) Support internlm2 (modelscope#320) Fix cogagent image loading (modelscope#318) update yuan-janus-2b-instruct sh (modelscope#317) Fix hf compatibility and support yuan (modelscope#316) add examples text to image (modelscope#304) Update deepseek moe (modelscope#314) fix share (modelscope#313) Fix modules to save (modelscope#312) Fix the saving behaviour of modules without state dict (modelscope#309) Fix link (modelscope#307) Update docs (modelscope#308) Fix bugs (modelscope#305) fix csv nan bug (modelscope#306) fix_ziya_template_bug (modelscope#303) fix a bug may cause module on gpu throws error (modelscope#302) fix text label (modelscope#301) ...
- Loading branch information
Showing
183 changed files
with
7,269 additions
and
469 deletions.
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,97 @@ | ||
# LLM人类对齐训练文档 | ||
## 目录 | ||
- [环境准备](#环境准备) | ||
- [人类对齐训练](#人类对齐训练) | ||
|
||
## 环境准备 | ||
GPU设备: A10, 3090, V100, A100均可,如果是显存<=24G的GPU最少需要双卡环境。由于人类对齐训练在一张卡上加载两个模型,因此比微调的显存多占用一个推理模型的显存使用量。 | ||
```bash | ||
# 设置pip全局镜像 | ||
pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/ | ||
# 安装ms-swift | ||
git clone https://github.com/modelscope/swift.git | ||
cd swift | ||
pip install -e .[llm] | ||
|
||
# 环境对齐 (如果你运行错误, 可以跑下面的代码, 仓库使用最新环境测试) | ||
pip install -r requirements/framework.txt -U | ||
pip install -r requirements/llm.txt -U | ||
``` | ||
|
||
## 人类对齐训练 | ||
下面的shell脚本运行了一个人类对齐训练。首先需要切换到运行目录: | ||
|
||
```shell | ||
cd examples/pytorch/llm | ||
``` | ||
|
||
运行下面的命令: | ||
|
||
```shell | ||
# Experimental environment: 4*A100 | ||
# Memory usage: 4 * 20G,双卡device_map * 2ddp | ||
nproc_per_node=2 | ||
|
||
PYTHONPATH=../../.. \ | ||
CUDA_VISIBLE_DEVICES=0,1,2,3 \ | ||
torchrun \ | ||
--nproc_per_node=$nproc_per_node \ | ||
--master_port 29500 \ | ||
llm_dpo.py \ | ||
--model_type mistral-7b \ | ||
--ref_model_type mistral-7b \ | ||
--model_revision master \ | ||
--sft_type lora \ | ||
--tuner_backend swift \ | ||
--dtype AUTO \ | ||
--output_dir output \ | ||
--dataset hh-rlhf \ | ||
--train_dataset_sample -1 \ | ||
--truncation_strategy truncation_left \ | ||
--val_dataset_sample 2000 \ | ||
--num_train_epochs 3 \ | ||
--max_length 1024 \ | ||
--max_prompt_length 512 \ | ||
--check_dataset_strategy none \ | ||
--lora_rank 8 \ | ||
--lora_alpha 32 \ | ||
--lora_dropout_p 0.05 \ | ||
--lora_target_modules ALL \ | ||
--gradient_checkpointing true \ | ||
--batch_size 1 \ | ||
--weight_decay 0.01 \ | ||
--learning_rate 5e-5 \ | ||
--gradient_accumulation_steps $(expr 16 / $nproc_per_node) \ | ||
--max_grad_norm 1.0 \ | ||
--warmup_ratio 0.03 \ | ||
--eval_steps 2000 \ | ||
--save_steps 2000 \ | ||
--save_total_limit 2 \ | ||
--logging_steps 10 \ | ||
``` | ||
|
||
### sh脚本 | ||
|
||
sh脚本可以查看[这里](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/dpo)。 | ||
|
||
```bash | ||
# 下面的脚本需要在此目录下执行 | ||
cd examples/pytorch/llm | ||
``` | ||
|
||
**提示**: | ||
|
||
- 我们默认在训练时设置`--gradient_checkpointing true`来**节约显存**, 这会略微降低训练速度. | ||
- 如果你使用的是**V100**等较老的GPU, 你需要设置`--dtype AUTO`或者`--dtype fp16`, 因为其不支持bf16. | ||
- 如果你的机器是A100等高性能显卡, 且使用的是qwen系列模型, 推荐你安装[**flash-attn**](https://github.com/Dao-AILab/flash-attention), 这将会加快训练和推理的速度以及显存占用(A10, 3090, V100等显卡不支持flash-attn进行训练). 支持flash-attn的模型可以查看[LLM支持的模型](./支持的模型和数据集.md#模型) | ||
- 如果你需要断网进行训练, 请使用`--model_cache_dir`和设置`--check_model_is_latest false`. 具体参数含义请查看[命令行参数](./命令行参数.md). | ||
- 如果你想在训练时, 将权重push到ModelScope Hub中, 你需要设置`--push_to_hub true`. | ||
|
||
```bash | ||
# dpo训练 mistral-7b max_length=1024,bs=1 | ||
# 推荐的实验环境: V100, A10, 3090,2卡4卡或8卡 | ||
bash scripts/dpo/lora_ddp_mp/dpo.sh | ||
bash scripts/dpo/lora_ddp_mp/infer.sh | ||
``` | ||
|
||
由于DPO训练后会得到一个完整模型或者adapter的weights,因此LoRA合并、推理的步骤和微调步骤相同,因此请参考[微调文档](./LLM微调文档.md#merge-lora)对应的步骤。 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.