Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Miscellaneous docker QoL improvements #91

Merged
merged 4 commits into from
Jan 26, 2021
Merged

Miscellaneous docker QoL improvements #91

merged 4 commits into from
Jan 26, 2021

Conversation

leogao2
Copy link
Contributor

@leogao2 leogao2 commented Jan 24, 2021

From @StellaAthena:

  1. Store the hostfile in /job/hostfile so that the code automatically finds it and we don't need to provide it as an argument.
  2. Install DeeperSpeed instead of DeepSpeed
  3. The script lands you in ~/app/, which contains the contents of the GPT-NeoX repository. Would it be possible to have the GPT-NeoX repo be one layer deeper, so that it drops you in ~/app/ which contains the directory gpt-neox? That would be very pleasant but isn't necessary.

Addresses 1 & 2, fixes XY behind 3

@leogao2 leogao2 requested a review from a team as a code owner January 24, 2021 23:21
Copy link
Member

@StellaAthena StellaAthena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This creates a job directory and puts the host file in it, but the directory is not in app! I believe this is fixed by changing line 10 to kubectl cp $PWD/hostfile $MASTER_ID:/app/job.

@leogao2
Copy link
Contributor Author

leogao2 commented Jan 26, 2021

I double checked with deepspeed documentation, /job/hostfile is the correct path.

@StellaAthena StellaAthena self-requested a review January 26, 2021 17:48
@StellaAthena StellaAthena merged commit b87875f into main Jan 26, 2021
@StellaAthena StellaAthena deleted the leogao2-patch-1 branch January 26, 2021 17:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants