-
Notifications
You must be signed in to change notification settings - Fork 226
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
c445b33
commit b4a0f37
Showing
42 changed files
with
1,364 additions
and
1,115 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,274 @@ | ||
# AnimateDiff的微调和推理 | ||
|
||
SWIFT已经支持了AnimateDiff的微调和推理,目前支持两种方式:全参数微调和LoRA微调。 | ||
|
||
首先需要clone并安装SWIFT: | ||
|
||
```shell | ||
git clone https://github.com/modelscope/swift.git | ||
cd swift | ||
pip install ".[aigc]" | ||
``` | ||
|
||
## 全参数训练 | ||
|
||
### 训练效果 | ||
|
||
全参数微调可以复现[官方提供的模型animatediff-motion-adapter-v1-5-2](https://www.modelscope.cn/models/Shanghai_AI_Laboratory/animatediff-motion-adapter-v1-5-2/summary)的效果,需要的短视频数量较多,魔搭官方复现使用了官方数据集的subset版本:[WebVid 2.5M](https://maxbain.com/webvid-dataset/)。训练效果如下: | ||
|
||
```text | ||
Prompt:masterpiece, bestquality, highlydetailed, ultradetailed, girl, walking, on the street, flowers | ||
``` | ||
|
||
|
||
|
||
![image.png](./resources/1.gif) | ||
|
||
```text | ||
Prompt: masterpiece, bestquality, highlydetailed, ultradetailed, beautiful house, mountain, snow top | ||
``` | ||
|
||
![image.png](./resources/2.gif) | ||
|
||
2.5M子数据集训练的生成效果仍存在效果不稳定的情况,开发者使用10M数据集效果会更稳定。 | ||
|
||
### 运行命令 | ||
|
||
```shell | ||
# 该文件在swift/examples/pytorch/animatediff/scripts/full中 | ||
# Experimental environment: A100 * 4 | ||
# 200GB GPU memory totally | ||
PYTHONPATH=../../.. \ | ||
CUDA_VISIBLE_DEVICES=0,1,2,3 \ | ||
torchrun --nproc_per_node=4 animatediff_sft.py \ | ||
--model_id_or_path wyj123456/Realistic_Vision_V5.1_noVAE \ | ||
--csv_path /mnt/workspace/yzhao/tastelikefeet/webvid/results_2M_train.csv \ | ||
--video_folder /mnt/workspace/yzhao/tastelikefeet/webvid/videos2 \ | ||
--sft_type full \ | ||
--lr_scheduler_type constant \ | ||
--trainable_modules .*motion_modules.* \ | ||
--batch_size 4 \ | ||
--eval_steps 100 \ | ||
--gradient_accumulation_steps 16 \ | ||
``` | ||
|
||
我们使用了A100 * 4进行训练,共需要200GB显存,训练时长约40小时。数据格式如下: | ||
|
||
```text | ||
--csv_path 传入一个csv文件,该csv文件应包含如下格式: | ||
name,contentUrl | ||
Travel blogger shoot a story on top of mountains. young man holds camera in forest.,stock-footage-travel-blogger-shoot-a-story-on-top-of-mountains-young-man-holds-camera-in-forest.mp4 | ||
``` | ||
|
||
name字段代表该短视频的prompt,contentUrl代表该视频文件的名称 | ||
|
||
```text | ||
--video_folder 传入一个视频目录,该目录中包含了csv文件中,contentUrl指代的所有视频文件 | ||
``` | ||
|
||
使用全参数进行推理方式如下: | ||
|
||
```shell | ||
# 该文件在swift/examples/pytorch/animatediff/scripts/full中 | ||
# Experimental environment: A100 | ||
# 18GB GPU memory | ||
PYTHONPATH=../../.. \ | ||
CUDA_VISIBLE_DEVICES=0 \ | ||
python animatediff_infer.py \ | ||
--model_id_or_path wyj123456/Realistic_Vision_V5.1_noVAE \ | ||
--sft_type full \ | ||
--ckpt_dir /output/path/like/checkpoints/iter-xxx \ | ||
--eval_human true \ | ||
``` | ||
|
||
其中的--ckpt_dir 传入训练时输出的文件夹即可。 | ||
|
||
## LoRA训练 | ||
|
||
### 运行命令 | ||
|
||
全参数训练会从0开始训练整个Motion-Adapter结构,用户可以使用现有的模型使用少量视频进行微调,只需要运行下面的命令: | ||
|
||
```shell | ||
# 该文件在swift/examples/pytorch/animatediff/scripts/lora中 | ||
# Experimental environment: A100 | ||
# 20GB GPU memory | ||
PYTHONPATH=../../.. \ | ||
CUDA_VISIBLE_DEVICES=0 \ | ||
python animatediff_sft.py \ | ||
--model_id_or_path wyj123456/Realistic_Vision_V5.1_noVAE \ | ||
--csv_path /mnt/workspace/yzhao/tastelikefeet/webvid/results_2M_train.csv \ | ||
--video_folder /mnt/workspace/yzhao/tastelikefeet/webvid/videos2 \ | ||
--motion_adapter_id_or_path Shanghai_AI_Laboratory/animatediff-motion-adapter-v1-5-2 \ | ||
--sft_type lora \ | ||
--lr_scheduler_type constant \ | ||
--trainable_modules .*motion_modules.* \ | ||
--batch_size 1 \ | ||
--eval_steps 200 \ | ||
--dataset_sample_size 10000 \ | ||
--gradient_accumulation_steps 16 \ | ||
``` | ||
|
||
视频数据参数同上。 | ||
|
||
推理命令如下: | ||
|
||
```shell | ||
# 该文件在swift/examples/pytorch/animatediff/scripts/lora中 | ||
# Experimental environment: A100 | ||
# 18GB GPU memory | ||
PYTHONPATH=../../.. \ | ||
CUDA_VISIBLE_DEVICES=0 \ | ||
python animatediff_infer.py \ | ||
--model_id_or_path wyj123456/Realistic_Vision_V5.1_noVAE \ | ||
--motion_adapter_id_or_path Shanghai_AI_Laboratory/animatediff-motion-adapter-v1-5-2 \ | ||
--sft_type lora \ | ||
--ckpt_dir /output/path/like/checkpoints/iter-xxx \ | ||
--eval_human true \ | ||
``` | ||
|
||
其中的--ckpt_dir 传入训练时输出的文件夹即可。 | ||
|
||
## 参数列表 | ||
|
||
下面给出训练和推理分别支持的参数列表及其含义: | ||
|
||
### 训练参数 | ||
|
||
```text | ||
motion_adapter_id_or_path: Optional[str] = None # motion adapter的模型id或模型路径,指定这个参数可以基于现有的官方模型效果继续训练 | ||
motion_adapter_revision: Optional[str] = None # motion adapter的模型revision,仅在motion_adapter_id_or_path是模型id时有用 | ||
model_id_or_path: str = None # sd基模型的模型id或模型路径 | ||
model_revision: str = None # sd基模型的revision,仅在model_id_or_path是模型id时有用 | ||
dataset_sample_size: int = None # 数据集训练条数,默认代表全量训练 | ||
sft_type: str = field( | ||
default='lora', metadata={'choices': ['lora', 'full']}) # 训练方式,支持lora和全参数 | ||
output_dir: str = 'output' # 输出文件夹 | ||
ddp_backend: str = field( | ||
default='nccl', metadata={'choices': ['nccl', 'gloo', 'mpi', 'ccl']}) # 如使用ddp训练,ddp backend | ||
seed: int = 42 # 随机种子 | ||
lora_rank: int = 8 # lora 参数 | ||
lora_alpha: int = 32 # lora 参数 | ||
lora_dropout_p: float = 0.05 # lora 参数 | ||
gradient_checkpointing: bool = False # 是否开启gc,默认不开启。注:当前版本diffusers有问题,不支持该参数为True | ||
batch_size: int = 1 # batchsize | ||
num_train_epochs: int = 1 # epoch数 | ||
# if max_steps >= 0, override num_train_epochs | ||
learning_rate: Optional[float] = None # 学习率 | ||
weight_decay: float = 0.01 # adamw参数 | ||
gradient_accumulation_steps: int = 16 # ga大小 | ||
max_grad_norm: float = 1. # grad norm大小 | ||
lr_scheduler_type: str = 'cosine' # lr_scheduler的类型 | ||
warmup_ratio: float = 0.05 # 是否warmup及warmup占比 | ||
eval_steps: int = 50 # eval step间隔 | ||
save_steps: Optional[int] = None # save step间隔 | ||
dataloader_num_workers: int = 1 # dataloader workers数量 | ||
push_to_hub: bool = False # 是否推送到modelhub | ||
# 'user_name/repo_name' or 'repo_name' | ||
hub_model_id: Optional[str] = None # modelhub id | ||
hub_private_repo: bool = True | ||
push_hub_strategy: str = field( # 推送策略,推送最后一个还是每个都推送 | ||
default='push_best', | ||
metadata={'choices': ['push_last', 'all_checkpoints']}) | ||
# None: use env var `MODELSCOPE_API_TOKEN` | ||
hub_token: Optional[str] = field( # modelhub的token | ||
default=None, | ||
metadata={ | ||
'help': | ||
'SDK token can be found in https://modelscope.cn/my/myaccesstoken' | ||
}) | ||
ignore_args_error: bool = False # True: notebook compatibility | ||
text_dropout_rate: float = 0.1 # drop一定比例的文本保证模型鲁棒性 | ||
validation_prompts_path: str = field( # 评测过程使用的prompt文件目录,默认使用swift/aigc/configs/validation.txt | ||
default=None, | ||
metadata={ | ||
'help': | ||
'The validation prompts file path, use aigc/configs/validation.txt is None' | ||
}) | ||
trainable_modules: str = field( # 可训练模块,建议使用默认值 | ||
default='.*motion_modules.*', | ||
metadata={ | ||
'help': | ||
'The trainable modules, by default, the .*motion_modules.* will be trained' | ||
}) | ||
mixed_precision: bool = True # 混合精度训练 | ||
enable_xformers_memory_efficient_attention: bool = True # 使用xformers | ||
num_inference_steps: int = 25 # | ||
guidance_scale: float = 8. | ||
sample_size: int = 256 | ||
sample_stride: int = 4 # 训练视频最大长度秒数 | ||
sample_n_frames: int = 16 # 每秒帧数 | ||
csv_path: str = None # 输入数据集 | ||
video_folder: str = None # 输入数据集 | ||
motion_num_attention_heads: int = 8 # motion adapter参数 | ||
motion_max_seq_length: int = 32 # motion adapter参数 | ||
num_train_timesteps: int = 1000 # 推理pipeline参数 | ||
beta_start: int = 0.00085 # 推理pipeline参数 | ||
beta_end: int = 0.012 # 推理pipeline参数 | ||
beta_schedule: str = 'linear' # 推理pipeline参数 | ||
steps_offset: int = 1 # 推理pipeline参数 | ||
clip_sample: bool = False # 推理pipeline参数 | ||
use_wandb: bool = False # 是否使用wandb | ||
``` | ||
|
||
### 推理参数 | ||
|
||
```text | ||
motion_adapter_id_or_path: Optional[str] = None # motion adapter的模型id或模型路径,指定这个参数可以基于现有的官方模型效果继续训练 | ||
motion_adapter_revision: Optional[str] = None # motion adapter的模型revision,仅在motion_adapter_id_or_path是模型id时有用 | ||
model_id_or_path: str = None # sd基模型的模型id或模型路径 | ||
model_revision: str = None # sd基模型的revision,仅在model_id_or_path是模型id时有用 | ||
sft_type: str = field( | ||
default='lora', metadata={'choices': ['lora', 'full']}) # 训练方式,支持lora和全参数 | ||
ckpt_dir: Optional[str] = field( | ||
default=None, metadata={'help': '/path/to/your/vx_xxx/checkpoint-xxx'}) # 训练的输出文件夹 | ||
eval_human: bool = False # False: eval val_dataset # 是否使用人工输入评测 | ||
seed: int = 42 # 随机种子 | ||
# other | ||
ignore_args_error: bool = False # True: notebook compatibility | ||
validation_prompts_path: str = None # 用于validation的文件,eval_human=False时使用,每一行一个prompt | ||
output_path: str = './generated' # 输出gif的目录 | ||
enable_xformers_memory_efficient_attention: bool = True # 使用xformers | ||
num_inference_steps: int = 25 # | ||
guidance_scale: float = 8. | ||
sample_size: int = 256 | ||
sample_stride: int = 4 # 训练视频最大长度秒数 | ||
sample_n_frames: int = 16 # 每秒帧数 | ||
motion_num_attention_heads: int = 8 # motion adapter参数 | ||
motion_max_seq_length: int = 32 # motion adapter参数 | ||
num_train_timesteps: int = 1000 # 推理pipeline参数 | ||
beta_start: int = 0.00085 # 推理pipeline参数 | ||
beta_end: int = 0.012 # 推理pipeline参数 | ||
beta_schedule: str = 'linear' # 推理pipeline参数 | ||
steps_offset: int = 1 # 推理pipeline参数 | ||
clip_sample: bool = False # 推理pipeline参数 | ||
``` |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.