You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been trying out DeepSpeed for MoE inference. Yet I found that codes from lines 269-272 might be buggy in DeepSpeed/deepspeed/inference/engine.py.
As depicted in the following figure, in line 269, if dist.get_world_size() is smaller than moe_ep_size (say there are 2 GPUs and 8 experts, and the moe_ep_size=8), num_ep_groups would be 0. In this case, the if branch in line 270 would never be reached.
I wonder if is this a redundant design. Or otherwise deepspeed does not support cases when #device<#expert (i.e., one device holds multiple experts) in MoE inference (because I'm pretty sure it is supported in training)?
Besides, I want to know if are there any off-the-shelf open-sourced pre-trained MoE models that can be used in DeepSpeed inference to test expert parallelism.
Thanks. Any replies would be appreciated! @tjruwase
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I've been trying out DeepSpeed for MoE inference. Yet I found that codes from lines 269-272 might be buggy in DeepSpeed/deepspeed/inference/engine.py.
As depicted in the following figure, in line 269, if dist.get_world_size() is smaller than moe_ep_size (say there are 2 GPUs and 8 experts, and the moe_ep_size=8), num_ep_groups would be 0. In this case, the if branch in line 270 would never be reached.
I wonder if is this a redundant design. Or otherwise deepspeed does not support cases when #device<#expert (i.e., one device holds multiple experts) in MoE inference (because I'm pretty sure it is supported in training)?
Besides, I want to know if are there any off-the-shelf open-sourced pre-trained MoE models that can be used in DeepSpeed inference to test expert parallelism.
Thanks. Any replies would be appreciated! @tjruwase
Beta Was this translation helpful? Give feedback.
All reactions