Large model instantiation using `DeepSpeed.zero.Init` under ZeRO-3 #1189

R0n12 · 2024-03-18T07:34:18Z

Is your feature request related to a problem? Please describe.
Currently GPT-NeoX doesn't support partitioned model initialization when using ZeRO-3, which will cause OOM error in most cases.

Describe the solution you'd like
A simple fix like this will do the trick inside get_model

    if neox_args.zero_stage == 3:
        with deepspeed.zero.Init():
            model = GPT2ModelPipe(
                neox_args=neox_args,
                num_tokentypes=0,
                parallel_output=True,
                topology=mpu.get_topology(),
                use_cache=use_cache,
            )

Describe alternatives you've considered
Other things I have in mind is to figure out a way to properly test this, I have tested this on a 175B model and it works. Please let me know if there's other testing needed

Additional context
Related issue: huggingface/accelerate#922

The text was updated successfully, but these errors were encountered:

R0n12 · 2024-03-18T07:35:10Z

I am working on a branch addressing this issue

R0n12 added the feature request New feature or request label Mar 18, 2024

R0n12 changed the title ~~Large model instantiation using DeepSpeed.zero.Init for extremely large model under ZeRO-3~~ Large model instantiation using DeepSpeed.zero.Init under ZeRO-3 Mar 18, 2024

R0n12 mentioned this issue Mar 18, 2024

[ZeRO-3] Partitioned init with deepspeed.zero.Init() #1190

Merged

Quentin-Anthony closed this as completed in #1190 Mar 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large model instantiation using `DeepSpeed.zero.Init` under ZeRO-3 #1189

Large model instantiation using `DeepSpeed.zero.Init` under ZeRO-3 #1189

R0n12 commented Mar 18, 2024

R0n12 commented Mar 18, 2024

Large model instantiation using DeepSpeed.zero.Init under ZeRO-3 #1189

Large model instantiation using DeepSpeed.zero.Init under ZeRO-3 #1189

Comments

R0n12 commented Mar 18, 2024

R0n12 commented Mar 18, 2024

Large model instantiation using `DeepSpeed.zero.Init` under ZeRO-3 #1189

Large model instantiation using `DeepSpeed.zero.Init` under ZeRO-3 #1189