Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gpt3small is broken #71

Closed
StellaAthena opened this issue Jan 18, 2021 · 6 comments · Fixed by #89
Closed

gpt3small is broken #71

StellaAthena opened this issue Jan 18, 2021 · 6 comments · Fixed by #89
Assignees
Labels
bug Something isn't working

Comments

@StellaAthena
Copy link
Member

gpt3small seems to have been left behind in some of our updates, and neither scripts/train_gpt3small.sh nor scripts/train_gpt3small_pipeline.sh run.

@StellaAthena StellaAthena added the bug Something isn't working label Jan 18, 2021
@srulikbd
Copy link
Contributor

scripts/train_gpt3small.sh is running at my colab (I only dropped the train batch size because of OOM)..

@StellaAthena
Copy link
Member Author

scripts/train_gpt3small.sh is running at my colab (I only dropped the train batch size because of OOM)..

Interesting. Can you provide a link to the file?

@StellaAthena StellaAthena self-assigned this Jan 20, 2021
@srulikbd
Copy link
Contributor

https://colab.research.google.com/drive/1IThG90kOdndybKuScNEZ1QG9nwnuj2l1?usp=sharing
there is some problem with the pipeline code. I'm trying fixing it. something with
num_stages (2) must divide distributed world size (1)

@StellaAthena
Copy link
Member Author

StellaAthena commented Jan 21, 2021 via email

@srulikbd
Copy link
Contributor

ok, what I suspected. thanks!
how can I help right now?
I saw that we need kubernetes skills, so I'm learning it.
anything else for a newcomer?

@StellaAthena
Copy link
Member Author

StellaAthena commented Jan 21, 2021 via email

@StellaAthena StellaAthena linked a pull request Jan 25, 2021 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants