PPO full parameters #931

Anonymousplendid · 2023-09-16T03:29:16Z

为了训练全量的模型，我们查看并修改了代码，并没有发现PPO部分有明显的逻辑限制。LoRA具体的限制在于1.手动限制2.使用actor，critic时保存的是LoRA，直接换LoRA而不是换模型。我们目前只找到了这两处限制，训练代码是可以跑通的，但是训练曲线有一些问题，其他issue下有讨论。LoRA还有什么其他的PPO中的限制吗？

hiyouga · 2023-09-16T08:22:57Z

全量时候有没有传入 ref_model？

TyrionZK · 2023-09-16T14:42:08Z

1、"为了训练全量的模型，我们查看并修改了代码，并没有发现PPO部分有明显的逻辑限制。"不知道楼主修改了哪里呢？@Anonymousplendid

2、我发现CustomPPOTrainer初始化函数中有异常，“PPOTrainer is incompatible with DeepSpeed.”但还没搞清楚哪里不兼容了。@hiyouga 可以给解释一下吗

Anonymousplendid · 2023-09-18T02:45:46Z

全量时候有没有传入 ref_model？

没有传入refmodel，我查了一下ref的创建过程，似乎没什么问题？

Anonymousplendid · 2023-09-18T02:46:21Z

1、"为了训练全量的模型，我们查看并修改了代码，并没有发现PPO部分有明显的逻辑限制。"不知道楼主修改了哪里呢？@Anonymousplendid

2、我发现CustomPPOTrainer初始化函数中有异常，“PPOTrainer is incompatible with DeepSpeed.”但还没搞清楚哪里不兼容了。@hiyouga 可以给解释一下吗

原题中已经写明

Anonymousplendid · 2023-09-26T05:38:30Z

发现了新的问题，rm部分似乎没有特别大的问题。ppo部分accelerate不支持多模型zero3，会报错weight must be 2-D.

Refactor llmtuner, support full-parameter RLHF

…uga#931 hiyouga#936 hiyouga#1011 Refactor llmtuner, support full-parameter RLHF

hiyouga added the pending This problem is yet to be addressed label Sep 16, 2023

hiyouga added solved This problem has been already solved and removed pending This problem is yet to be addressed labels Nov 1, 2023

hiyouga closed this as completed Nov 1, 2023

hiyouga mentioned this issue Nov 15, 2023

Refactor llmtuner, support full-parameter RLHF #1525

Merged

hiyouga added a commit that referenced this issue Nov 16, 2023

Merge #1525 from hiyouga/dev, fix #224 #336 #931 #936 #1011

f04bc2a

Refactor llmtuner, support full-parameter RLHF

sangttruong pushed a commit to painkillernhat/LLaMA-Factory that referenced this issue May 9, 2024

Merge hiyouga#1525 from hiyouga/dev, fix hiyouga#224 hiyouga#336 hiyo…

8d994ce

…uga#931 hiyouga#936 hiyouga#1011 Refactor llmtuner, support full-parameter RLHF

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PPO full parameters #931

PPO full parameters #931

Anonymousplendid commented Sep 16, 2023

hiyouga commented Sep 16, 2023

TyrionZK commented Sep 16, 2023

Anonymousplendid commented Sep 18, 2023 •

edited

Loading

Anonymousplendid commented Sep 18, 2023

Anonymousplendid commented Sep 26, 2023

PPO full parameters #931

PPO full parameters #931

Comments

Anonymousplendid commented Sep 16, 2023

hiyouga commented Sep 16, 2023

TyrionZK commented Sep 16, 2023

Anonymousplendid commented Sep 18, 2023 • edited Loading

Anonymousplendid commented Sep 18, 2023

Anonymousplendid commented Sep 26, 2023

Anonymousplendid commented Sep 18, 2023 •

edited

Loading