Merge commit '4f6007e12597e14e619548c1eef1346a9792a7f7' into feat/course

* commit '4f6007e12597e14e619548c1eef1346a9792a7f7': (22 commits) fix freeze parameters bug (modelscope#325) Add courses (modelscope#324) fix ui bugs (modelscope#322) fix vllm env bug (modelscope#321) Support internlm2 (modelscope#320) Fix cogagent image loading (modelscope#318) update yuan-janus-2b-instruct sh (modelscope#317) Fix hf compatibility and support yuan (modelscope#316) add examples text to image (modelscope#304) Update deepseek moe (modelscope#314) fix share (modelscope#313) Fix modules to save (modelscope#312) Fix the saving behaviour of modules without state dict (modelscope#309) Fix link (modelscope#307) Update docs (modelscope#308) Fix bugs (modelscope#305) fix csv nan bug (modelscope#306) fix_ziya_template_bug (modelscope#303) fix a bug may cause module on gpu throws error (modelscope#302) fix text label (modelscope#301) ...
tastelikefeet · Jan 19, 2024 · 44e2756 · 44e2756
2 parents 91531a0 + 4f6007e
commit 44e2756
Show file tree

Hide file tree

Showing 183 changed files with 7,269 additions and 469 deletions.
diff --git a/README.md b/README.md
diff --git a/README_CN.md b/README_CN.md
diff --git a/docs/source/GetStarted/界面训练推理.md b/docs/source/GetStarted/界面训练推理.md
@@ -5,3 +5,10 @@ swift web-ui
 ```
 
 开启界面训练和推理。
+
+web-ui没有传入参数，所有可控部分都在界面中。但是有几个环境变量可以使用：
+
+> WEBUI_SHARE=1 控制gradio是否是share状态
+> SWIFT_UI_LANG=en/zh 控制web-ui界面语言
+> WEBUI_SERVER server_name参数， web-ui host ip，0.0.0.0代表所有ip均可访问，127.0.0.1代表只允许本机访问
+> WEBUI_PORT web-ui的端口号
diff --git a/docs/source/LLM/LLM人类对齐训练文档.md b/docs/source/LLM/LLM人类对齐训练文档.md
@@ -0,0 +1,97 @@
+# LLM人类对齐训练文档
+## 目录
+- [环境准备](#环境准备)
+- [人类对齐训练](#人类对齐训练)
+
+## 环境准备
+GPU设备: A10, 3090, V100, A100均可，如果是显存<=24G的GPU最少需要双卡环境。由于人类对齐训练在一张卡上加载两个模型，因此比微调的显存多占用一个推理模型的显存使用量。
+```bash
+# 设置pip全局镜像
+pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/
+# 安装ms-swift
+git clone https://github.com/modelscope/swift.git
+cd swift
+pip install -e .[llm]
+
+# 环境对齐 (如果你运行错误, 可以跑下面的代码, 仓库使用最新环境测试)
+pip install -r requirements/framework.txt -U
+pip install -r requirements/llm.txt -U
+```
+
+## 人类对齐训练
+下面的shell脚本运行了一个人类对齐训练。首先需要切换到运行目录：
+
+```shell
+cd examples/pytorch/llm
+```
+
+运行下面的命令：
+
+```shell
+# Experimental environment: 4*A100
+# Memory usage: 4 * 20G，双卡device_map * 2ddp
+nproc_per_node=2
+
+PYTHONPATH=../../.. \
+CUDA_VISIBLE_DEVICES=0,1,2,3 \
+torchrun \
+ --nproc_per_node=$nproc_per_node \
+ --master_port 29500 \
+ llm_dpo.py \
+ --model_type mistral-7b \
+ --ref_model_type mistral-7b \
+ --model_revision master \
+ --sft_type lora \
+ --tuner_backend swift \
+ --dtype AUTO \
+ --output_dir output \
+ --dataset hh-rlhf \
+ --train_dataset_sample -1 \
+ --truncation_strategy truncation_left \
+ --val_dataset_sample 2000 \
+ --num_train_epochs 3 \
+ --max_length 1024 \
+ --max_prompt_length 512 \
+ --check_dataset_strategy none \
+ --lora_rank 8 \
+ --lora_alpha 32 \
+ --lora_dropout_p 0.05 \
+ --lora_target_modules ALL \
+ --gradient_checkpointing true \
+ --batch_size 1 \
+ --weight_decay 0.01 \
+ --learning_rate 5e-5 \
+ --gradient_accumulation_steps $(expr 16 / $nproc_per_node) \
+ --max_grad_norm 1.0 \
+ --warmup_ratio 0.03 \
+ --eval_steps 2000 \
+ --save_steps 2000 \
+ --save_total_limit 2 \
+ --logging_steps 10 \
+```
+
+### sh脚本
+
+sh脚本可以查看[这里](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/dpo)。
+
+```bash
+# 下面的脚本需要在此目录下执行
+cd examples/pytorch/llm
+```
+
+**提示**:
+
+- 我们默认在训练时设置`--gradient_checkpointing true`来**节约显存**, 这会略微降低训练速度.
+- 如果你使用的是**V100**等较老的GPU, 你需要设置`--dtype AUTO`或者`--dtype fp16`, 因为其不支持bf16.
+- 如果你的机器是A100等高性能显卡, 且使用的是qwen系列模型, 推荐你安装[**flash-attn**](https://github.com/Dao-AILab/flash-attention), 这将会加快训练和推理的速度以及显存占用(A10, 3090, V100等显卡不支持flash-attn进行训练). 支持flash-attn的模型可以查看[LLM支持的模型](./支持的模型和数据集.md#模型)
+- 如果你需要断网进行训练, 请使用`--model_cache_dir`和设置`--check_model_is_latest false`. 具体参数含义请查看[命令行参数](./命令行参数.md).
+- 如果你想在训练时, 将权重push到ModelScope Hub中, 你需要设置`--push_to_hub true`.
+
+```bash
+# dpo训练 mistral-7b max_length=1024，bs=1
+# 推荐的实验环境: V100, A10, 3090，2卡4卡或8卡
+bash scripts/dpo/lora_ddp_mp/dpo.sh
+bash scripts/dpo/lora_ddp_mp/infer.sh
+```
+
+由于DPO训练后会得到一个完整模型或者adapter的weights，因此LoRA合并、推理的步骤和微调步骤相同，因此请参考[微调文档](./LLM微调文档.md#merge-lora)对应的步骤。
diff --git a/docs/source/LLM/LLM微调文档.md b/docs/source/LLM/LLM微调文档.md
@@ -2,6 +2,7 @@
 ## 目录
 - [环境准备](#环境准备)
 - [微调](#微调)
+- [DPO](#dpo)
 - [Merge LoRA](#merge-lora)
 - [推理](#推理)
 - [Web-UI](#web-ui)
@@ -33,6 +34,8 @@ pip install -r requirements/llm.txt -U
 ```
 
 ## 微调
+如果你要使用界面的方式进行微调与推理, 可以查看[界面训练与推理文档](https://github.com/modelscope/swift/blob/main/docs/source/GetStarted/%E7%95%8C%E9%9D%A2%E8%AE%AD%E7%BB%83%E6%8E%A8%E7%90%86.md).
+
 ### 使用python
 ```python
 # Experimental environment: A10, 3090, V100, ...
@@ -215,6 +218,9 @@ bash scripts/qwen_7b_chat/qlora_ddp_ds/sft.sh
 bash scripts/qwen_7b_chat/qlora_ddp_ds/infer.sh
 ```
 
+## DPO
+如果你要使用DPO进行人类对齐, 你可以查看[人类对齐微调文档](./LLM人类对齐训练文档.md).
+
 ## Merge LoRA
 提示: **暂时**不支持bnb和auto_gptq量化模型的merge lora, 这会产生较大的精度损失.
 ```bash

diff --git a/docs/source/LLM/LLM推理文档.md b/docs/source/LLM/LLM推理文档.md
@@ -413,21 +413,21 @@ CUDA_VISIBLE_DEVICES=0 swift app-ui --model_type qwen-7b-chat
 import os
 os.environ['CUDA_VISIBLE_DEVICES'] = '0'
 
-from swift.llm import InferArguments, ModelType, app_ui_main
+from swift.llm import AppUIArguments, ModelType, app_ui_main
 
-infer_args = InferArguments(model_type=ModelType.qwen_7b_chat)
-app_ui_main(infer_args)
+app_ui_args = AppUIArguments(model_type=ModelType.qwen_7b_chat)
+app_ui_main(app_ui_args)
 ```
 
 使用bnb量化:
 ```python
 import os
 os.environ['CUDA_VISIBLE_DEVICES'] = '0'
 
-from swift.llm import InferArguments, ModelType, app_ui_main
+from swift.llm import AppUIArguments, ModelType, app_ui_main
 
-infer_args = InferArguments(model_type=ModelType.qwen_7b_chat, quantization_bit=4)
-app_ui_main(infer_args)
+app_ui_args = AppUIArguments(model_type=ModelType.qwen_7b_chat, quantization_bit=4)
+app_ui_main(app_ui_args)
 ```
 
 ### qwen-7b
@@ -441,10 +441,10 @@ CUDA_VISIBLE_DEVICES=0 swift app-ui --model_type qwen-7b
 import os
 os.environ['CUDA_VISIBLE_DEVICES'] = '0'
 
-from swift.llm import InferArguments, ModelType, app_ui_main
+from swift.llm import AppUIArguments, ModelType, app_ui_main
 
-infer_args = InferArguments(model_type=ModelType.qwen_7b)
-app_ui_main(infer_args)
+app_ui_args = AppUIArguments(model_type=ModelType.qwen_7b)
+app_ui_main(app_ui_args)
 ```
 
 ### 微调后模型

diff --git a/docs/source/LLM/命令行参数.md b/docs/source/LLM/命令行参数.md
@@ -1,10 +1,12 @@
 # 命令行参数
 ## 目录
-- [sft 命令行参数](#sft-命令行参数)
-- [merge-lora infer app-ui 命令行参数](#merge-lora-infer-app-ui-命令行参数)
-- [deploy 命令行参数](#deploy-命令行参数)
+- [SFT 参数](#SFT-参数)
+- [DPO 参数](#DPO-参数)
+- [merge-lora infer 参数](#merge-lora-infer-参数)
+- [app-ui 参数](#app-ui-参数)
+- [deploy 参数](#deploy-参数)
 
-## sft 命令行参数
+## SFT 参数
 - `--model_type`: 表示你选择的模型类型, 默认是`None`. 如果没有指定`model_id_or_path`, 则抛出异常. 如果指定了`model_id_or_path`, 则会根据`model_id_or_path`以及`MODEL_MAPPING`推断`model_type`. `model_type`和`model_id_or_path`这两个参数不能同时指定. 可以选择的`model_type`可以查看`MODEL_MAPPING.keys()`.
 - `--model_id_or_path`: 表示模型在ModelScope Hub中的`model_id`, 不区分大小写, 默认为`None`. 如果`--model_id_or_path`未被注册, 则会抛出异常. 你可以使用`model_type`的方式指定模型类型, 也可以通过`model_id_or_path`的方式指定模型类型.
 - `--model_revision`: 表示模型在ModelScope Hub中对应`model_id`的版本号, 默认为`None`. `model_revision`指定为`None`, 则使用注册在`MODEL_MAPPING`中的revision. 否则强制使用命令行传入的`model_revision`.
@@ -45,7 +47,7 @@
 - `--lora_bias_trainable`: 默认为`'none'`, 可以选择的值: 'none', 'all'. 如果你要将bias全都设置为可训练, 你可以设置为`'all'`.
 - `--lora_modules_to_save`: 默认为`[]`. 如果你想要训练embedding, lm_head, 或者layer_norm, 你可以设置此参数, 例如: `--lora_modules_to_save wte ln_1 ln_2 ln_f lm_head`.
 - `--lora_dtype`: 默认为`'fp32'`, 指定lora模块的dtype类型. 如果是`AUTO`则跟随原始模块的dtype类型. 你可以选择的值: 'fp16', 'bf16', 'fp32', 'AUTO'.
-- `--neftune_alpha`: `NEFTune`添加的噪声系数.
+- `--neftune_noise_alpha`: `NEFTune`添加的噪声系数, 可以提升模型在指令微调中的性能, 默认为`None`. 通常可以设置为5, 10, 15. 你可以查看[相关论文](https://arxiv.org/abs/2310.05914).
 - `--gradient_checkpointing`: 是否开启gradient checkpointing, 默认为`True`. 该参数可以用于节约显存, 虽然这会略微降低训练速度. 该参数在max_length较大, batch_size较大时作用显著.
 - `--deepspeed_config_path`: 用于指定deepspeed的配置文件的路径, 默认为`None`, 即不开启deepspeed. deepspeed可以节约显存. 我们书写了默认的[ZeRO-2的配置文件](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/ds_config/zero2.json).
 - `--batch_size`: 训练时的batch_size, 默认为`1`. 增大batch_size可以增加GPU的利用率, 但不一定会增加训练速度, 因为在一个batch中, 需要对较短的句子按该batch中最长句子的长度进行padding, 从而引入无效的计算量.
@@ -89,11 +91,18 @@
 - `--temperature`: 默认为`0.3`. 该参数只有在`predict_with_generate`设置为True的时候才生效.
 - `--top_k`: 默认为`20`. 该参数只有在`predict_with_generate`设置为True的时候才生效.
 - `--top_p`: 默认为`0.7`. 该参数只有在`predict_with_generate`设置为True的时候才生效.
-- `--repetition_penalty`: 默认为`1.05`. 该参数只有在`predict_with_generate`设置为True的时候才生效.
+- `--repetition_penalty`: 默认为`1.`. 该参数只有在`predict_with_generate`设置为True的时候才生效.
 - `--num_beams`: 默认为`1`. 该参数只有在`predict_with_generate`设置为True的时候才生效.
 
+## DPO 参数
 
-## merge-lora infer app-ui 命令行参数
+dpo参数继承了sft参数, 除此之外增加了以下参数:
+
+- `--ref_model_type` 对比模型的类型, 可以选择的`model_type`可以查看`MODEL_MAPPING.keys()`.
+- `--max_prompt_length` 最大的提示长度, 该参数会传入DPOTrainer中, 使prompt长度不超过该值的设置, 默认值`1024`.
+
+
+## merge-lora infer 参数
 - `--model_type`: 默认值为`None`, 具体的参数介绍可以在`sft.sh命令行参数`中查看.
 - `--model_id_or_path`: 默认值为`None`, 具体的参数介绍可以在`sft.sh命令行参数`中查看. 推荐使用model_type的方式指定.
 - `--model_revision`: 默认值为`None`. 具体的参数介绍可以在`sft.sh命令行参数`中查看. 如果`model_id_or_path`为None或者是本地的模型目录, 则该参数失效.
@@ -126,7 +135,7 @@
 - `--temperature`: 默认值为`0.3`. 该参数只有在`do_sample`设置为True时才生效. 该参数会在部署参数中作为默认值使用.
 - `--top_k`: 默认值为`20`. 该参数只有在`do_sample`设置为True时才生效. 该参数会在部署参数中作为默认值使用.
 - `--top_p`: 默认值为`0.7`. 该参数只有在`do_sample`设置为True时才生效. 该参数会在部署参数中作为默认值使用.
-- `--repetition_penalty`: 默认值为`1.05`. 该参数会在部署参数中作为默认值使用.
+- `--repetition_penalty`: 默认值为`1.`. 该参数会在部署参数中作为默认值使用.
 - `--num_beams`: 默认为`1`.
 - `--use_flash_attn`: 默认值为`None`, 即为'auto'. 具体的参数介绍可以在`sft.sh命令行参数`中查看.
 - `--ignore_args_error`: 默认值为`False`, 具体的参数介绍可以在`sft.sh命令行参数`中查看.
@@ -135,14 +144,23 @@
 - `--save_safetensors`: 保存成`safetensors`文件还是`bin`文件. 默认为`True`.
 - `--overwrite_generation_config`: 是否将评估所使用的generation_config保存成`generation_config.json`文件, 默认为`None`. 如果指定了`ckpt_dir`, 则设置为`True`, 否则设置为`False`. 训练时保存的generation_config文件将被覆盖.
 - `--verbose`: 如果设置为False, 则使用tqdm样式推理. 如果设置为True, 则输出推理的query, response, label. 默认为`None`, 进行自动选择, 即`len(val_dataset) >= 100`时, 设置为False, 否则设置为True. 该参数只有在使用数据集评估时生效.
-- `--share`: 传递给gradio的`demo.queue().launch(...)`函数. 该参数只有在使用`app-ui`时才生效.
 - `--gpu_memory_utilization`: 初始化vllm引擎`EngineArgs`的参数, 默认为`0.9`. 该参数只有在使用vllm时才生效.
 - `--tensor_parallel_size`: 初始化vllm引擎`EngineArgs`的参数, 默认为`1`. 该参数只有在使用vllm时才生效.
 
 
-## deploy 命令行参数
+## app-ui 参数
+
+app-ui参数继承了infer参数, 除此之外增加了以下参数:
+
+- `server_name`: 默认为`'127.0.0.1'`. 传递给gradio的`demo.queue().launch(...)`函数.
+- `server_port`: 默认为`7860`. 传递给gradio的`demo.queue().launch(...)`函数.
+- `share`: 默认为`False`. 传递给gradio的`demo.queue().launch(...)`函数.
+
+## deploy 参数
+
+deploy参数继承了infer参数, 除此之外增加了以下参数:
+
 - `--host`: 默认为`'127.0.0.1`.
 - `--port`: 默认为`8000`.
 - `--ssl_keyfile`: 默认为`None`.
 - `--ssl_certfile`: 默认为`None`.
-- 其他参数继承自infer的命令行参数.