You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Currently GPT-NeoX doesn't support partitioned model initialization when using ZeRO-3, which will cause OOM error in most cases.
Describe the solution you'd like
A simple fix like this will do the trick inside get_model
if neox_args.zero_stage == 3:
with deepspeed.zero.Init():
model = GPT2ModelPipe(
neox_args=neox_args,
num_tokentypes=0,
parallel_output=True,
topology=mpu.get_topology(),
use_cache=use_cache,
)
Describe alternatives you've considered
Other things I have in mind is to figure out a way to properly test this, I have tested this on a 175B model and it works. Please let me know if there's other testing needed
R0n12
changed the title
Large model instantiation using DeepSpeed.zero.Init for extremely large model under ZeRO-3
Large model instantiation using DeepSpeed.zero.Init under ZeRO-3
Mar 18, 2024
Is your feature request related to a problem? Please describe.
Currently GPT-NeoX doesn't support partitioned model initialization when using ZeRO-3, which will cause OOM error in most cases.
Describe the solution you'd like
A simple fix like this will do the trick inside
get_model
Describe alternatives you've considered
Other things I have in mind is to figure out a way to properly test this, I have tested this on a 175B model and it works. Please let me know if there's other testing needed
Additional context
Related issue: huggingface/accelerate#922
The text was updated successfully, but these errors were encountered: