We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Testing the codebase on an AWS instance and, if /job/hostfile is not present, you need to add num_gpus to the config to get training working.
/job/hostfile
num_gpus
Could (should?) we autodetect the number of GPUs if nothing is specified?
if not, we should add a more informative error message. This is the current traceback if num_gpus isn't specified and /job/hostfile isn't present:
Traceback (most recent call last): File "./deepy.py", line 67, in <module> old_style_args, conf = ConfigMonster().consume_args(extra_conf=extra_conf) File "/home/ubuntu/gpt-neox/megatron/config_monster.py", line 379, in consume_args ds_runner_conf, megatron_conf, ds_config_conf = self.derive_params_and_split(conf) File "/home/ubuntu/gpt-neox/megatron/config_monster.py", line 318, in derive_params_and_split world_size = ((num_gpus / pp_size) / mp_size) TypeError: unsupported operand type(s) for /: 'NoneType' and 'int'
The text was updated successfully, but these errors were encountered:
fixed
Sorry, something went wrong.
No branches or pull requests
Testing the codebase on an AWS instance and, if
/job/hostfile
is not present, you need to addnum_gpus
to the config to get training working.Could (should?) we autodetect the number of GPUs if nothing is specified?
if not, we should add a more informative error message. This is the current traceback if
num_gpus
isn't specified and/job/hostfile
isn't present:The text was updated successfully, but these errors were encountered: