Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixed case when ntasks_per_node is used, leading to missing SLURM_NTASKS env var #1069

Merged

Conversation

AIproj
Copy link
Contributor

@AIproj AIproj commented Nov 1, 2023

On some clusters, it appears that a script specifying #SBATCH --ntasks_per_node instead of #SBATCH --ntasks will fail to build the environment variable os.environ["SLURM_NTASKS"]. This leads to an error in arguments.py, due to accessing a non-existing key of os.environ. A simple fix is to note that NTASKS = NNODES x SLURM_NTASKS_PER_NODE and use the right handside if the SLURM_NTASKS key does not exist in os.environ.

@CLAassistant
Copy link

CLAassistant commented Nov 1, 2023

CLA assistant check
All committers have signed the CLA.

@Quentin-Anthony Quentin-Anthony merged commit 41f019e into EleutherAI:main Nov 1, 2023
2 of 5 checks passed
R0n12 pushed a commit to R0n12/gpt-neox-fork that referenced this pull request Nov 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants