-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PPO full parameters #931
Comments
全量时候有没有传入 ref_model? |
1、"为了训练全量的模型,我们查看并修改了代码,并没有发现PPO部分有明显的逻辑限制。"不知道楼主修改了哪里呢?@Anonymousplendid 2、我发现CustomPPOTrainer初始化函数中有异常,“PPOTrainer is incompatible with DeepSpeed.”但还没搞清楚哪里不兼容了。@hiyouga 可以给解释一下吗 |
没有传入refmodel,我查了一下ref的创建过程,似乎没什么问题? |
原题中已经写明 |
发现了新的问题,rm部分似乎没有特别大的问题。ppo部分accelerate不支持多模型zero3,会报错weight must be 2-D. |
…uga#931 hiyouga#936 hiyouga#1011 Refactor llmtuner, support full-parameter RLHF
为了训练全量的模型,我们查看并修改了代码,并没有发现PPO部分有明显的逻辑限制。LoRA具体的限制在于1.手动限制2.使用actor,critic时保存的是LoRA,直接换LoRA而不是换模型。我们目前只找到了这两处限制,训练代码是可以跑通的,但是训练曲线有一些问题,其他issue下有讨论。LoRA还有什么其他的PPO中的限制吗?
The text was updated successfully, but these errors were encountered: