Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add s3 checkpoint syncing #1010

Merged
merged 10 commits into from
Sep 23, 2023
Prev Previous commit
Next Next commit
Make s3 imports try-except and separate requirements to s3 file
  • Loading branch information
Quentin-Anthony committed Sep 23, 2023
commit 3d76d4f311fe13fb9a49e3cb83ae7c1c49d904d7
10 changes: 8 additions & 2 deletions megatron/checkpointing.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,14 @@
import sys
import numpy as np

import boto3
import hf_transfer
try:
import boto3
except ModuleNotFoundError:
print("For s3 checkpointing, please install boto3 either using requirements/requirements-s3.txt or https://github.com/boto/boto3")
try:
import hf_transfer
except ModuleNotFoundError:
print("For s3 checkpointing, please install hf_transfer either using requirements/requirements-s3.txt or https://github.com/huggingface/hf_transfer")
import torch
from glob import glob

Expand Down
2 changes: 2 additions & 0 deletions requirements/requirements-s3.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
hf-transfer>=0.1.3
boto3
1 change: 0 additions & 1 deletion requirements/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@ git+https://github.com/EleutherAI/DeeperSpeed.git#egg=deepspeed
ftfy>=6.0.1
git+https://github.com/EleutherAI/lm_dataformat.git@4eec05349977071bf67fc072290b95e31c8dd836
huggingface_hub>=0.11.0
hf-transfer>=0.1.3
lm_eval>=0.3.0
mpi4py>=3.0.3
numpy>=1.22.0
Expand Down
Loading