Refine LoRA to peft (#176)

modelscope · Nov 29, 2023 · b4a0f37 · b4a0f37
1 parent c445b33
commit b4a0f37
Show file tree

Hide file tree

Showing 42 changed files with 1,364 additions and 1,115 deletions.
diff --git a/.dev_scripts/ci_container_test.sh b/.dev_scripts/ci_container_test.sh
@@ -21,6 +21,8 @@ if [ "$MODELSCOPE_SDK_DEBUG" == "True" ]; then
  fi
 
  pip install -r requirements/framework.txt -U -i https://mirrors.aliyun.com/pypi/simple/
+ pip install -r requirements/llm.txt -U -i https://mirrors.aliyun.com/pypi/simple/
+ pip install -r requirements/aigc.txt -U -i https://mirrors.aliyun.com/pypi/simple/
 
  # test with install
  pip install .

diff --git a/README.md b/README.md
@@ -57,6 +57,7 @@ Users can check the [documentation of SWIFT](docs/source/GetStarted/快速使用
 
 
 ## 🎉 News
+- 🔥 2023.11.29: Support the training and inference for AnimateDiff
 - 🔥 2023.11.24: Support for **yi-34b-chat**, **codefuse-codellama-34b-chat**: The corresponding shell script can be found in [yi_34b_chat](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/yi_34b_chat), [codefuse_codellama_34b_chat](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/codefuse_codellama_34b_chat).
 - 🔥 2023.11.18: Support for **tongyi-finance-14b** series models: tongyi-finance-14b, tongyi-finance-14b-chat, tongyi-finance-14b-chat-int4. The corresponding shell script can be found in [tongyi_finance_14b_chat_int4](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/tongyi_finance_14b_chat_int4).
 - 🔥 2023.11.16: Added support for more models in **flash attn**: qwen series, qwen-vl series, llama series, openbuddy series, mistral series, yi series, ziya series. Please use the `use_flash_attn` parameter.
@@ -137,6 +138,13 @@ SWIFT is running in Python environment. Please make sure your python version is
 - Install SWIFT by the `pip` command:
 
 ```shell
+# full ability
+pip install ms-swift[all] -U
+# only use llm
+pip install ms-swift[llm] -U
+# only use aigc
+pip install ms-swift[aigc] -U
+# only use adapters
 pip install ms-swift -U
 ```
 
@@ -145,7 +153,7 @@ pip install ms-swift -U
 ```shell
 git clone https://github.com/modelscope/swift.git
 cd swift
-pip install -e .
+pip install -e .[llm]
 ```
 
 SWIFT requires torch>=1.13.

diff --git a/README_CN.md b/README_CN.md
@@ -56,6 +56,7 @@ SWIFT（Scalable lightWeight Infrastructure for Fine-Tuning）是一个可扩展
 
 
 ## 🎉 新闻
+- 🔥 2023.11.29: 支持AnimateDiff的训练和推理
 - 🔥 2023.11.24: 支持**yi-34b-chat**, **codefuse-codellama-34b-chat**模型. 对应的sh脚本可以查看[yi_34b_chat](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/yi_34b_chat), [codefuse_codellama_34b_chat](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/codefuse_codellama_34b_chat).
 - 🔥 2023.11.18: 支持**tongyi-finance-14b**系列模型: tongyi-finance-14b, tongyi-finance-14b-chat, tongyi-finance-14b-chat-int4. 对应的sh脚本可以查看[tongyi_finance_14b_chat_int4](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/tongyi_finance_14b_chat_int4).
 - 🔥 2023.11.16: 支持更多模型的**flash attn**支持: qwen系列, qwen-vl系列, llama系列, openbuddy系列, mistral系列, yi系列, ziya系列. 请使用`use_flash_attn`参数.
@@ -136,6 +137,13 @@ SWIFT在Python环境中运行。请确保您的Python版本高于3.8。
 - 方法1：使用pip命令安装SWIFT：
 
 ```shell
+# 全量能力
+pip install ms-swift[all] -U
+# 仅使用LLM
+pip install ms-swift[llm] -U
+# 仅使用AIGC
+pip install ms-swift[aigc] -U
+# 仅使用adapters
 pip install ms-swift -U
 ```
 
@@ -144,7 +152,7 @@ pip install ms-swift -U
 ```shell
 git clone https://github.com/modelscope/swift.git
 cd swift
-pip install -e .
+pip install -e .[llm]
 ```
 
 SWIFT依赖torch>=1.13。

diff --git a/docs/source/AIGC/AnimateDiff微调推理文档.md b/docs/source/AIGC/AnimateDiff微调推理文档.md
@@ -0,0 +1,274 @@
+# AnimateDiff的微调和推理
+
+SWIFT已经支持了AnimateDiff的微调和推理，目前支持两种方式：全参数微调和LoRA微调。
+
+首先需要clone并安装SWIFT：
+
+```shell
+git clone https://github.com/modelscope/swift.git
+cd swift
+pip install ".[aigc]"
+```
+
+## 全参数训练
+
+### 训练效果
+
+全参数微调可以复现[官方提供的模型animatediff-motion-adapter-v1-5-2](https://www.modelscope.cn/models/Shanghai_AI_Laboratory/animatediff-motion-adapter-v1-5-2/summary)的效果，需要的短视频数量较多，魔搭官方复现使用了官方数据集的subset版本：[WebVid 2.5M](https://maxbain.com/webvid-dataset/)。训练效果如下：
+
+```text
+Prompt:masterpiece, bestquality, highlydetailed, ultradetailed, girl, walking, on the street, flowers
+```
+
+
+
+![image.png](./resources/1.gif)
+
+```text
+Prompt: masterpiece, bestquality, highlydetailed, ultradetailed, beautiful house, mountain, snow top
+```
+
+![image.png](./resources/2.gif)
+
+2.5M子数据集训练的生成效果仍存在效果不稳定的情况，开发者使用10M数据集效果会更稳定。
+
+### 运行命令
+
+```shell
+# 该文件在swift/examples/pytorch/animatediff/scripts/full中
+# Experimental environment: A100 * 4
+# 200GB GPU memory totally
+PYTHONPATH=../../.. \
+CUDA_VISIBLE_DEVICES=0,1,2,3 \
+torchrun --nproc_per_node=4 animatediff_sft.py \
+ --model_id_or_path wyj123456/Realistic_Vision_V5.1_noVAE \
+ --csv_path /mnt/workspace/yzhao/tastelikefeet/webvid/results_2M_train.csv \
+ --video_folder /mnt/workspace/yzhao/tastelikefeet/webvid/videos2 \
+ --sft_type full \
+ --lr_scheduler_type constant \
+ --trainable_modules .*motion_modules.* \
+ --batch_size 4 \
+ --eval_steps 100 \
+ --gradient_accumulation_steps 16 \
+```
+
+我们使用了A100 * 4进行训练，共需要200GB显存，训练时长约40小时。数据格式如下：
+
+```text
+--csv_path 传入一个csv文件，该csv文件应包含如下格式：
+name,contentUrl
+Travel blogger shoot a story on top of mountains. young man holds camera in forest.,stock-footage-travel-blogger-shoot-a-story-on-top-of-mountains-young-man-holds-camera-in-forest.mp4
+```
+
+name字段代表该短视频的prompt，contentUrl代表该视频文件的名称
+
+```text
+--video_folder 传入一个视频目录，该目录中包含了csv文件中，contentUrl指代的所有视频文件
+```
+
+使用全参数进行推理方式如下：
+
+```shell
+# 该文件在swift/examples/pytorch/animatediff/scripts/full中
+# Experimental environment: A100
+# 18GB GPU memory
+PYTHONPATH=../../.. \
+CUDA_VISIBLE_DEVICES=0 \
+python animatediff_infer.py \
+ --model_id_or_path wyj123456/Realistic_Vision_V5.1_noVAE \
+ --sft_type full \
+ --ckpt_dir /output/path/like/checkpoints/iter-xxx \
+ --eval_human true \
+```
+
+其中的--ckpt_dir 传入训练时输出的文件夹即可。
+
+## LoRA训练
+
+### 运行命令
+
+全参数训练会从0开始训练整个Motion-Adapter结构，用户可以使用现有的模型使用少量视频进行微调，只需要运行下面的命令：
+
+```shell
+# 该文件在swift/examples/pytorch/animatediff/scripts/lora中
+# Experimental environment: A100
+# 20GB GPU memory
+PYTHONPATH=../../.. \
+CUDA_VISIBLE_DEVICES=0 \
+python animatediff_sft.py \
+ --model_id_or_path wyj123456/Realistic_Vision_V5.1_noVAE \
+ --csv_path /mnt/workspace/yzhao/tastelikefeet/webvid/results_2M_train.csv \
+ --video_folder /mnt/workspace/yzhao/tastelikefeet/webvid/videos2 \
+ --motion_adapter_id_or_path Shanghai_AI_Laboratory/animatediff-motion-adapter-v1-5-2 \
+ --sft_type lora \
+ --lr_scheduler_type constant \
+ --trainable_modules .*motion_modules.* \
+ --batch_size 1 \
+ --eval_steps 200 \
+ --dataset_sample_size 10000 \
+ --gradient_accumulation_steps 16 \
+```
+
+视频数据参数同上。
+
+推理命令如下：
+
+```shell
+# 该文件在swift/examples/pytorch/animatediff/scripts/lora中
+# Experimental environment: A100
+# 18GB GPU memory
+PYTHONPATH=../../.. \
+CUDA_VISIBLE_DEVICES=0 \
+python animatediff_infer.py \
+ --model_id_or_path wyj123456/Realistic_Vision_V5.1_noVAE \
+ --motion_adapter_id_or_path Shanghai_AI_Laboratory/animatediff-motion-adapter-v1-5-2 \
+ --sft_type lora \
+ --ckpt_dir /output/path/like/checkpoints/iter-xxx \
+ --eval_human true \
+```
+
+其中的--ckpt_dir 传入训练时输出的文件夹即可。
+
+## 参数列表
+
+下面给出训练和推理分别支持的参数列表及其含义：
+
+### 训练参数
+
+```text
+motion_adapter_id_or_path: Optional[str] = None # motion adapter的模型id或模型路径，指定这个参数可以基于现有的官方模型效果继续训练
+motion_adapter_revision: Optional[str] = None # motion adapter的模型revision，仅在motion_adapter_id_or_path是模型id时有用
+
+model_id_or_path: str = None # sd基模型的模型id或模型路径
+model_revision: str = None # sd基模型的revision，仅在model_id_or_path是模型id时有用
+
+dataset_sample_size: int = None # 数据集训练条数，默认代表全量训练
+
+sft_type: str = field(
+ default='lora', metadata={'choices': ['lora', 'full']}) # 训练方式，支持lora和全参数
+
+output_dir: str = 'output' # 输出文件夹
+ddp_backend: str = field(
+ default='nccl', metadata={'choices': ['nccl', 'gloo', 'mpi', 'ccl']}) # 如使用ddp训练，ddp backend
+
+seed: int = 42 # 随机种子
+
+lora_rank: int = 8 # lora 参数
+lora_alpha: int = 32 # lora 参数
+lora_dropout_p: float = 0.05 # lora 参数
+
+gradient_checkpointing: bool = False # 是否开启gc，默认不开启。注：当前版本diffusers有问题，不支持该参数为True
+batch_size: int = 1 # batchsize
+num_train_epochs: int = 1 # epoch数
+# if max_steps >= 0, override num_train_epochs
+learning_rate: Optional[float] = None # 学习率
+weight_decay: float = 0.01 # adamw参数
+gradient_accumulation_steps: int = 16 # ga大小
+max_grad_norm: float = 1. # grad norm大小
+lr_scheduler_type: str = 'cosine' # lr_scheduler的类型
+warmup_ratio: float = 0.05 # 是否warmup及warmup占比
+
+eval_steps: int = 50 # eval step间隔
+save_steps: Optional[int] = None # save step间隔
+dataloader_num_workers: int = 1 # dataloader workers数量
+
+push_to_hub: bool = False # 是否推送到modelhub
+# 'user_name/repo_name' or 'repo_name'
+hub_model_id: Optional[str] = None # modelhub id
+hub_private_repo: bool = True
+push_hub_strategy: str = field( # 推送策略，推送最后一个还是每个都推送
+ default='push_best',
+ metadata={'choices': ['push_last', 'all_checkpoints']})
+# None: use env var `MODELSCOPE_API_TOKEN`
+hub_token: Optional[str] = field( # modelhub的token
+ default=None,
+ metadata={
+ 'help':
+ 'SDK token can be found in https://modelscope.cn/my/myaccesstoken'
+ })
+
+ignore_args_error: bool = False # True: notebook compatibility
+
+text_dropout_rate: float = 0.1 # drop一定比例的文本保证模型鲁棒性
+
+validation_prompts_path: str = field( # 评测过程使用的prompt文件目录，默认使用swift/aigc/configs/validation.txt
+ default=None,
+ metadata={
+ 'help':
+ 'The validation prompts file path, use aigc/configs/validation.txt is None'
+ })
+
+trainable_modules: str = field( # 可训练模块，建议使用默认值
+ default='.*motion_modules.*',
+ metadata={
+ 'help':
+ 'The trainable modules, by default, the .*motion_modules.* will be trained'
+ })
+
+mixed_precision: bool = True # 混合精度训练
+
+enable_xformers_memory_efficient_attention: bool = True # 使用xformers
+
+num_inference_steps: int = 25 #
+guidance_scale: float = 8.
+sample_size: int = 256
+sample_stride: int = 4 # 训练视频最大长度秒数
+sample_n_frames: int = 16 # 每秒帧数
+
+csv_path: str = None # 输入数据集
+video_folder: str = None # 输入数据集
+
+motion_num_attention_heads: int = 8 # motion adapter参数
+motion_max_seq_length: int = 32 # motion adapter参数
+num_train_timesteps: int = 1000 # 推理pipeline参数
+beta_start: int = 0.00085 # 推理pipeline参数
+beta_end: int = 0.012 # 推理pipeline参数
+beta_schedule: str = 'linear' # 推理pipeline参数
+steps_offset: int = 1 # 推理pipeline参数
+clip_sample: bool = False # 推理pipeline参数
+
+use_wandb: bool = False # 是否使用wandb
+```
+
+### 推理参数
+
+```text
+motion_adapter_id_or_path: Optional[str] = None # motion adapter的模型id或模型路径，指定这个参数可以基于现有的官方模型效果继续训练
+motion_adapter_revision: Optional[str] = None # motion adapter的模型revision，仅在motion_adapter_id_or_path是模型id时有用
+
+model_id_or_path: str = None # sd基模型的模型id或模型路径
+model_revision: str = None # sd基模型的revision，仅在model_id_or_path是模型id时有用
+
+sft_type: str = field(
+ default='lora', metadata={'choices': ['lora', 'full']}) # 训练方式，支持lora和全参数
+
+ckpt_dir: Optional[str] = field(
+ default=None, metadata={'help': '/path/to/your/vx_xxx/checkpoint-xxx'}) # 训练的输出文件夹
+eval_human: bool = False # False: eval val_dataset # 是否使用人工输入评测
+
+seed: int = 42 # 随机种子
+
+# other
+ignore_args_error: bool = False # True: notebook compatibility
+
+validation_prompts_path: str = None # 用于validation的文件，eval_human=False时使用，每一行一个prompt
+
+output_path: str = './generated' # 输出gif的目录
+
+enable_xformers_memory_efficient_attention: bool = True # 使用xformers
+
+num_inference_steps: int = 25 #
+guidance_scale: float = 8.
+sample_size: int = 256
+sample_stride: int = 4 # 训练视频最大长度秒数
+sample_n_frames: int = 16 # 每秒帧数
+
+motion_num_attention_heads: int = 8 # motion adapter参数
+motion_max_seq_length: int = 32 # motion adapter参数
+num_train_timesteps: int = 1000 # 推理pipeline参数
+beta_start: int = 0.00085 # 推理pipeline参数
+beta_end: int = 0.012 # 推理pipeline参数
+beta_schedule: str = 'linear' # 推理pipeline参数
+steps_offset: int = 1 # 推理pipeline参数
+clip_sample: bool = False # 推理pipeline参数
+```
diff --git a/docs/source/AIGC/resources/1.gif b/docs/source/AIGC/resources/1.gif
diff --git a/docs/source/AIGC/resources/2.gif b/docs/source/AIGC/resources/2.gif
diff --git a/docs/source/GetStarted/SWIFT安装.md b/docs/source/GetStarted/SWIFT安装.md
@@ -5,6 +5,13 @@
 可以使用pip进行安装：
 
 ```shell
+# 全量能力
+pip install ms-swift[all] -U
+# 仅使用LLM
+pip install ms-swift[llm] -U
+# 仅使用AIGC
+pip install ms-swift[aigc] -U
+# 仅使用adapters
 pip install ms-swift -U
 ```
 
@@ -13,7 +20,7 @@ pip install ms-swift -U
 ```shell
 git clone https://github.com/modelscope/swift.git
 cd swift
-pip install -e .
+pip install -e .[llm]
 ```
 
 ## Notebook环境

diff --git a/docs/source/GetStarted/快速使用.md b/docs/source/GetStarted/快速使用.md
@@ -17,6 +17,13 @@ SWIFT库的github地址是：https://github.com/modelscope/swift
 安装swift的方式非常简单，用户只需要在python>=3.8环境中运行：
 
 ```shell
+# 全量能力
+pip install ms-swift[all] -U
+# 仅使用LLM
+pip install ms-swift[llm] -U
+# 仅使用AIGC
+pip install ms-swift[aigc] -U
+# 仅使用adapters
 pip install ms-swift -U
 ```
 

diff --git a/docs/source/LLM/LLM微调文档.md b/docs/source/LLM/LLM微调文档.md
@@ -7,7 +7,7 @@ pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/
 # 安装ms-swift
 git clone https://github.com/modelscope/swift.git
 cd swift
-pip install -e .
+pip install -e .[llm]
 # 下面的脚本需要在此目录下执行
 cd examples/pytorch/llm