-
Notifications
You must be signed in to change notification settings - Fork 981
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bug-fix] enable finetuning option(set optimizer params correctly) #927
Conversation
logan.eo seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
It looks like you’ve only changed the example config file, not the actual bug? |
@StellaAthena |
@taegyeongeo who or what is “logan.eo” and why are they an author of this PR? |
@StellaAthena |
@taegyeongeo Apologies that we haven't merged this yet... we've been very busy of late and haven't been able to test it rigorously. You say
To be clear, does means that you also need to update DeeperSpeed to use this? |
No, @taegyeongeo is just saying that we're using DeepSpeed's existing This doesn't require DeeperSpeed changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for this fix!
Following your option to fine-tune the original 20B checkpoint, still got the error ""Empty ds_version in checkpoint"" as #767. |
@WaveLi123 |
Yes, it is the version mismatch of deepspeed. How to add "ds_verison" in 20b checkpint?Any code reference? Thanks a lot. |
@WaveLi123 Just add it to |
Not work to simply add " "ds_version": "0.3.15", " in 20B.yml. Got error info: " TypeError: init() got an unexpected keyword argument 'ds_version' " |
This won't work. You need to add it to the model's @taegyeongeo -- Do you have a snippet you can share? I'd love to be able to point gpt-neox 1.0 users to a small snippet that just loads the checkpoint and adds the |
neox repo has problem to use "finetune" option (#767)
this option can reset hyperparams in optimizer/lr_scheduler but, doesn't set model parameters correctly
i you use finetune in main branch it doesn't sync module with optimizer params
To sync module with optimizer params, change just 1 params simply
i add one line code to use load_module_only option in deepspeed's load_checkpoint function
in deepspeed's load_checkpoint function, load_module_only enable syncing optimizer params with module params
i check this code with 6b model and find that finetuning works correctly (varifying output response quality and valid_loss)
i commit to contribute this project, please review my codes