Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cull pp = 0 model branch #269

Merged
merged 26 commits into from
Apr 30, 2021
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
cc2c900
fix wandb group stuff
sdtblck Apr 28, 2021
f92f8c9
fix checkpointing if deepspeed_activation_checkpointing = true
sdtblck Apr 28, 2021
62e4528
get rid of all codepaths where pp = 0, rearrange layout
sdtblck Apr 28, 2021
32b7bd1
refactor checkpointing
sdtblck Apr 28, 2021
56987c2
rename megatron_args to neox_args + remove unused argument
sdtblck Apr 28, 2021
8b6d515
remove unused FP16 code (deepspeed handles this)
sdtblck Apr 28, 2021
b58c48d
remove unused gradient clipping code (deepspeed handles this)
sdtblck Apr 28, 2021
d622349
remove apex dependency in training.py
sdtblck Apr 28, 2021
4e2d64a
removed unused megatron/memory.py
sdtblck Apr 28, 2021
a7b7b18
update requirements + dockerfile
sdtblck Apr 28, 2021
5e9dc55
Merge branch 'main' into cull-model-branch
sdtblck Apr 28, 2021
0b8fee9
get pipe to normal conversion working properly
sdtblck Apr 28, 2021
c80212e
Merge remote-tracking branch 'origin/cull-model-branch' into cull-mod…
sdtblck Apr 28, 2021
871e679
fix eval_helper
sdtblck Apr 28, 2021
77fe200
fix Dockerfile
sdtblck Apr 28, 2021
243c60a
get rid of megatron/data/dataset_utils.py
sdtblck Apr 28, 2021
f19e14a
update random.py
sdtblck Apr 28, 2021
e5212b1
remove some duplicate code
sdtblck Apr 28, 2021
de042f3
revert config changes
sdtblck Apr 28, 2021
3cf01de
revert changes to checkpointing.py
sdtblck Apr 28, 2021
6f5079f
test model update after gpt2 model remove
Apr 29, 2021
1dae917
adding more test configs
Apr 29, 2021
3c59574
Merge branch 'testcases_continued' into cull-model-branch
kipgparker Apr 29, 2021
df76402
remove MegatronModule + all custom saving logic (shit's cursed)
sdtblck Apr 29, 2021
09b5d06
delete deepspeed lmao
sdtblck Apr 30, 2021
ac00dbd
revert changes to small config
sdtblck Apr 30, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
fix wandb group stuff
  • Loading branch information
sdtblck committed Apr 28, 2021
commit cc2c90010f30b8ac1749b2c5347e99b56afe3156
6 changes: 5 additions & 1 deletion deepy.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,6 @@
import deepspeed
from deepspeed.launcher.runner import main
import requests

import logging

logging.basicConfig(level=os.environ.get("LOGLEVEL", "INFO"))
Expand All @@ -43,8 +42,13 @@ def get_wandb_api_key():


neox_args = NeoXArgs.consume_deepy_args()
if neox_args.wandb_group is not None:
# concat the wandb group name with a uid to make sure it's unique
import wandb
neox_args.wandb_group += "_" + wandb.util.generate_id()
neox_args.print()
deepspeed_main_args = neox_args.get_deepspeed_main_args()


if __name__ == '__main__':
main(deepspeed_main_args)
Loading