Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand to all 8 CoreWeave Machines #68

Closed
StellaAthena opened this issue Jan 16, 2021 · 1 comment · Fixed by #77
Closed

Expand to all 8 CoreWeave Machines #68

StellaAthena opened this issue Jan 16, 2021 · 1 comment · Fixed by #77
Assignees
Labels
feature request New feature or request
Projects

Comments

@StellaAthena
Copy link
Member

Right now we are running on a single 8-core machine, but we have access to 8. We need to figure out how to set up cross-machine distributed learning using Kubernettes.

@StellaAthena StellaAthena added the feature request New feature or request label Jan 16, 2021
@StellaAthena StellaAthena added this to To do in 1T or BUST via automation Jan 16, 2021
@StellaAthena StellaAthena moved this from To do to In progress in 1T or BUST Jan 23, 2021
@StellaAthena StellaAthena self-assigned this Jan 23, 2021
@StellaAthena
Copy link
Member Author

StellaAthena commented Jan 23, 2021

Big progress! @leogao2 and I got it running in parallel on two nodes. There were some finicky bits and we haven't run on all 8 nodes yet, but now that we know how to get it running hopefully we can make progress quicker.

@StellaAthena StellaAthena linked a pull request Jan 23, 2021 that will close this issue
@StellaAthena StellaAthena moved this from In progress to Done in 1T or BUST Jan 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
Development

Successfully merging a pull request may close this issue.

1 participant