Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python version update #1122

Merged
merged 19 commits into from
Feb 23, 2024
Merged

Conversation

segyges
Copy link
Contributor

@segyges segyges commented Jan 14, 2024

Don't know if this is ready or not; in my local testing it fails some of the pytest tests, but it's plausible to likely it was doing so before. Bumps image to ubuntu 22.04 and uses the system python (3.10); iff debugging is extensive, it probably makes more sense to bump the python version all the way to 3.12 and up the pytorch version too just so we don't have to do this again for as long as possible.

@segyges
Copy link
Contributor Author

segyges commented Jan 14, 2024

On next pass: Can confirm that I get the same 28 pytest failures when running inside docker whether I'm on main or this branch, so I am not getting any signal from that.

For starting training and/or preparing data, this branch seems to work fine.

@segyges
Copy link
Contributor Author

segyges commented Jan 17, 2024

If someone feels like doing work for me, they should do the following:

  1. Have a system with a gpu, up to date nvidia drivers, and nvidia container toolkit installed
  2. Clone this branch
  3. In the gpt-neox folder, run the following:
docker compose run gpt-neox -d # starts the container
docker compose exec gpt-neox bash # you will be in the container now
cd gpt-neox
python deepy.py train.py ./configs/pythia/14M.yml ./configs/docker/pythia-paths.yml

Tell me if it crashes and give me your errors if it does

@segyges
Copy link
Contributor Author

segyges commented Jan 19, 2024

If someone feels like doing work for me, they should do the following:

  1. Have a system with a gpu, up to date nvidia drivers, and nvidia container toolkit installed
  2. Clone this branch
  3. In the gpt-neox folder, run the following:
docker compose run gpt-neox -d # starts the container
docker compose exec gpt-neox bash # you will be in the container now
cd gpt-neox
python deepy.py train.py ./configs/pythia/14M.yml ./configs/docker/pythia-paths.yml

Tell me if it crashes and give me your errors if it does

Got this, got an apex build problem, fixed it, the problem is that now apex takes forever to build, need to fork apex and only retain the fused adamw that we use and then it should be good.

Because the version bumps are sufficiently large, it probably makes sense to test it on a live cluster before merging the branch to make sure nothing very important is broken.

@StellaAthena StellaAthena linked an issue Feb 4, 2024 that may be closed by this pull request
@Quentin-Anthony Quentin-Anthony marked this pull request as ready for review February 23, 2024 00:50
@Quentin-Anthony Quentin-Anthony added the merge-queue This PR is next on the queue to merge label Feb 23, 2024
Copy link
Member

@Quentin-Anthony Quentin-Anthony left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested and works for me. Thanks!

@Quentin-Anthony Quentin-Anthony merged commit a7638a8 into EleutherAI:main Feb 23, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merge-queue This PR is next on the queue to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update to current versions of python and pytorch
2 participants