[bug-fix] enable finetuning option(set optimizer params correctly) #927

taegyeongeo · 2023-05-06T00:36:54Z

neox repo has problem to use "finetune" option (#767)
this option can reset hyperparams in optimizer/lr_scheduler but, doesn't set model parameters correctly

i you use finetune in main branch it doesn't sync module with optimizer params

To sync module with optimizer params, change just 1 params simply

        checkpoint_name, state_dict = model.load_checkpoint(
            neox_args.load,
            load_optimizer_states=load_optim_and_scheduler,
            load_lr_scheduler_states=load_optim_and_scheduler,
            load_module_only=not load_optim_and_scheduler,
            tag=tag,
        )

i add one line code to use load_module_only option in deepspeed's load_checkpoint function

        if load_module_only:
            deepspeed_states = ['module']
            if self.optimizer is not None and self.fp16_enabled():
                self.optimizer.refresh_fp32_params()

in deepspeed's load_checkpoint function, load_module_only enable syncing optimizer params with module params

i check this code with 6b model and find that finetuning works correctly (varifying output response quality and valid_loss)

i commit to contribute this project, please review my codes

CLAassistant · 2023-05-06T00:36:59Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

logan.eo seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

StellaAthena · 2023-05-08T13:58:43Z

It looks like you’ve only changed the example config file, not the actual bug?

taegyeongeo · 2023-05-08T14:05:53Z

@StellaAthena
sorry, i didn't commit "megatron/checkpoint.py"
i push it now

StellaAthena · 2023-05-08T14:28:57Z

@taegyeongeo who or what is “logan.eo” and why are they an author of this PR?

taegyeongeo · 2023-05-08T14:46:00Z

@StellaAthena
oh,, it has conflict with my github enterprise account
logan.eo, that is me.

StellaAthena · 2023-05-09T18:01:14Z

@taegyeongeo Apologies that we haven't merged this yet... we've been very busy of late and haven't been able to test it rigorously.

You say

i add one line code to use load_module_only option in deepspeed's load_checkpoint function

To be clear, does means that you also need to update DeeperSpeed to use this?

Quentin-Anthony · 2023-05-09T18:20:33Z

@taegyeongeo Apologies that we haven't merged this yet... we've been very busy of late and haven't been able to test it rigorously.

You say

i add one line code to use load_module_only option in deepspeed's load_checkpoint function

To be clear, does means that you also need to update DeeperSpeed to use this?

No, @taegyeongeo is just saying that we're using DeepSpeed's existing load_module_only here: https://github.com/microsoft/DeepSpeed/blob/58c4d230920f10b9a0c33891b6cb88afc1a6a5f4/deepspeed/runtime/engine.py#L2542

This doesn't require DeeperSpeed changes.

Quentin-Anthony

LGTM. Thanks for this fix!

WaveLi123 · 2023-05-10T02:46:07Z

neox repo has problem to use "finetune" option (#767) this option can reset hyperparams in optimizer/lr_scheduler but, doesn't set model parameters correctly

i you use finetune in main branch it doesn't sync module with optimizer params

To sync module with optimizer params, change just 1 params simply
        checkpoint_name, state_dict = model.load_checkpoint(
            neox_args.load,
            load_optimizer_states=load_optim_and_scheduler,
            load_lr_scheduler_states=load_optim_and_scheduler,
            load_module_only=not load_optim_and_scheduler,
            tag=tag,
        )
i add one line code to use load_module_only option in deepspeed's load_checkpoint function
        if load_module_only:
            deepspeed_states = ['module']
            if self.optimizer is not None and self.fp16_enabled():
                self.optimizer.refresh_fp32_params()
in deepspeed's load_checkpoint function, load_module_only enable syncing optimizer params with module params

i check this code with 6b model and find that finetuning works correctly (varifying output response quality and valid_loss)

i commit to contribute this project, please review my codes

Following your option to fine-tune the original 20B checkpoint, still got the error ""Empty ds_version in checkpoint"" as #767.

taegyeongeo · 2023-05-10T16:09:55Z

@WaveLi123
i know this problem
for the test, i disabled "ds_version check" lines in deepspeed
if you want to test no editing deepspeed codes, you just add "ds_version" in 20b checkpoint
it is not issue about code but checkpoint is not compatible with current deepspeed version

WaveLi123 · 2023-05-11T05:21:14Z

@WaveLi123 i know this problem for the test, i disabled "ds_version check" lines in deepspeed if you want to test no editing deepspeed codes, you just add "ds_version" in 20b checkpoint it is not issue about code but checkpoint is not compatible with current deepspeed version

Yes, it is the version mismatch of deepspeed. How to add "ds_verison" in 20b checkpint？Any code reference? Thanks a lot.

StellaAthena · 2023-05-11T06:12:57Z

@WaveLi123 Just add it to 20B.yml

WaveLi123 · 2023-05-12T07:37:42Z

@WaveLi123 Just add it to 20B.yml

Not work to simply add " "ds_version": "0.3.15", " in 20B.yml. Got error info: " TypeError: init() got an unexpected keyword argument 'ds_version' "

Quentin-Anthony · 2023-05-12T14:32:55Z

@WaveLi123 Just add it to 20B.yml

This won't work. You need to add it to the model's state_dict object itself.

@taegyeongeo -- Do you have a snippet you can share? I'd love to be able to point gpt-neox 1.0 users to a small snippet that just loads the checkpoint and adds the ds_version attribute (arbitrarily based on an arg).

[bug-fix] enable finetuning option(set optimizer params correctly)

0c8ad3c

taegyeongeo requested a review from a team as a code owner May 6, 2023 00:36

taegyeongeo requested review from Quentin-Anthony and StellaAthena May 6, 2023 00:36

StellaAthena mentioned this pull request May 6, 2023

Can't finetune 20B model from slim weights with zero optimizer enabled #926

Open

change load_checkpoint

2279118

Quentin-Anthony approved these changes May 9, 2023

View reviewed changes

Quentin-Anthony merged commit befd133 into EleutherAI:main May 9, 2023
0 of 3 checks passed

bentherien mentioned this pull request May 14, 2023

Prevent zero optimizer loading when finetuning a model and using zero_optimization EleutherAI/DeeperSpeed#51

Merged

This was referenced Jan 12, 2024

[BUG] Setting Finetune=True causes checkpoint loading to not work correctly microsoft/DeepSpeed#4944

Closed

[BUG] Setting Finetune=True causes checkpoint loading to not work correctly #1121

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bug-fix] enable finetuning option(set optimizer params correctly) #927

[bug-fix] enable finetuning option(set optimizer params correctly) #927

taegyeongeo commented May 6, 2023 •

edited

Loading

CLAassistant commented May 6, 2023

StellaAthena commented May 8, 2023

taegyeongeo commented May 8, 2023

StellaAthena commented May 8, 2023

taegyeongeo commented May 8, 2023 •

edited

Loading

StellaAthena commented May 9, 2023

Quentin-Anthony commented May 9, 2023

Quentin-Anthony left a comment

WaveLi123 commented May 10, 2023

taegyeongeo commented May 10, 2023 •

edited

Loading

WaveLi123 commented May 11, 2023

StellaAthena commented May 11, 2023

WaveLi123 commented May 12, 2023 •

edited

Loading

Quentin-Anthony commented May 12, 2023

[bug-fix] enable finetuning option(set optimizer params correctly) #927

[bug-fix] enable finetuning option(set optimizer params correctly) #927

Conversation

taegyeongeo commented May 6, 2023 • edited Loading

CLAassistant commented May 6, 2023

StellaAthena commented May 8, 2023

taegyeongeo commented May 8, 2023

StellaAthena commented May 8, 2023

taegyeongeo commented May 8, 2023 • edited Loading

StellaAthena commented May 9, 2023

Quentin-Anthony commented May 9, 2023

Quentin-Anthony left a comment

Choose a reason for hiding this comment

WaveLi123 commented May 10, 2023

taegyeongeo commented May 10, 2023 • edited Loading

WaveLi123 commented May 11, 2023

StellaAthena commented May 11, 2023

WaveLi123 commented May 12, 2023 • edited Loading

Quentin-Anthony commented May 12, 2023

taegyeongeo commented May 6, 2023 •

edited

Loading

taegyeongeo commented May 8, 2023 •

edited

Loading

taegyeongeo commented May 10, 2023 •

edited

Loading

WaveLi123 commented May 12, 2023 •

edited

Loading