-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python version update #1122
Python version update #1122
Conversation
On next pass: Can confirm that I get the same 28 pytest failures when running inside docker whether I'm on main or this branch, so I am not getting any signal from that. For starting training and/or preparing data, this branch seems to work fine. |
If someone feels like doing work for me, they should do the following:
Tell me if it crashes and give me your errors if it does |
Got this, got an apex build problem, fixed it, the problem is that now apex takes forever to build, need to fork apex and only retain the fused adamw that we use and then it should be good. Because the version bumps are sufficiently large, it probably makes sense to test it on a live cluster before merging the branch to make sure nothing very important is broken. |
3e1592c
to
74dde40
Compare
1) This, empirically, works, as tested by running the build and kicking off training. 2) Apex documentation says it is incorrect syntax and deprecated. 3) It takes so long to compile that it is probably, all by itself, something that needs fixing. 4) I will probably pull the fused adamw out of apex. 5) It has been building for twenty minutes so I am going to go do something else.
Prevents possible build issues with apex especially across divergent pip versions
This reverts commit 40c7656.
3c983a8
to
21cd88f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested and works for me. Thanks!
In this issue Python 3.10 support was added EleutherAI#1122
Don't know if this is ready or not; in my local testing it fails some of the pytest tests, but it's plausible to likely it was doing so before. Bumps image to ubuntu 22.04 and uses the system python (3.10); iff debugging is extensive, it probably makes more sense to bump the python version all the way to 3.12 and up the pytorch version too just so we don't have to do this again for as long as possible.