Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PPO full parameters #931

Closed
Anonymousplendid opened this issue Sep 16, 2023 · 5 comments
Closed

PPO full parameters #931

Anonymousplendid opened this issue Sep 16, 2023 · 5 comments
Labels
solved This problem has been already solved

Comments

@Anonymousplendid
Copy link

为了训练全量的模型,我们查看并修改了代码,并没有发现PPO部分有明显的逻辑限制。LoRA具体的限制在于1.手动限制2.使用actor,critic时保存的是LoRA,直接换LoRA而不是换模型。我们目前只找到了这两处限制,训练代码是可以跑通的,但是训练曲线有一些问题,其他issue下有讨论。LoRA还有什么其他的PPO中的限制吗?
training_reward

@hiyouga hiyouga added the pending This problem is yet to be addressed label Sep 16, 2023
@hiyouga
Copy link
Owner

hiyouga commented Sep 16, 2023

全量时候有没有传入 ref_model?

@TyrionZK
Copy link

1、"为了训练全量的模型,我们查看并修改了代码,并没有发现PPO部分有明显的逻辑限制。"不知道楼主修改了哪里呢?@Anonymousplendid

2、我发现CustomPPOTrainer初始化函数中有异常,“PPOTrainer is incompatible with DeepSpeed.”但还没搞清楚哪里不兼容了。@hiyouga 可以给解释一下吗

@Anonymousplendid
Copy link
Author

Anonymousplendid commented Sep 18, 2023

全量时候有没有传入 ref_model?

没有传入refmodel,我查了一下ref的创建过程,似乎没什么问题?

@Anonymousplendid
Copy link
Author

1、"为了训练全量的模型,我们查看并修改了代码,并没有发现PPO部分有明显的逻辑限制。"不知道楼主修改了哪里呢?@Anonymousplendid

2、我发现CustomPPOTrainer初始化函数中有异常,“PPOTrainer is incompatible with DeepSpeed.”但还没搞清楚哪里不兼容了。@hiyouga 可以给解释一下吗

原题中已经写明

@Anonymousplendid
Copy link
Author

发现了新的问题,rm部分似乎没有特别大的问题。ppo部分accelerate不支持多模型zero3,会报错weight must be 2-D.

@hiyouga hiyouga added solved This problem has been already solved and removed pending This problem is yet to be addressed labels Nov 1, 2023
@hiyouga hiyouga closed this as completed Nov 1, 2023
hiyouga added a commit that referenced this issue Nov 16, 2023
Refactor llmtuner, support full-parameter RLHF
sangttruong pushed a commit to painkillernhat/LLaMA-Factory that referenced this issue May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved
Projects
None yet
Development

No branches or pull requests

3 participants