Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement distributed training using Kubernetes #77

Merged
merged 17 commits into from
Jan 23, 2021
Merged
Prev Previous commit
Next Next commit
Remove command line argument
  • Loading branch information
leogao2 committed Jan 23, 2021
commit ee1739e80a57a4bd9ba6e03b2673f1c28ad75b56
2 changes: 1 addition & 1 deletion deploy_k8s.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,4 @@ export MASTER_ID=$(kubectl get pods | grep eleuther-neox | awk '{print $1}' | he
echo $MASTER_ID
kubectl cp $PWD/hosts $MASTER_ID:/app
#echo 'git remote set-url origin https://github.com/EleutherAI/gpt-neox/ && git pull' | kubectl exec --stdin --tty $MASTER_ID -- /bin/bash
echo "$@" | kubectl exec --stdin --tty $MASTER_ID -- /bin/bash
kubectl exec --stdin --tty $MASTER_ID -- /bin/bash