forked from NVIDIA/Megatron-LM
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'pipeline_parallel_main' into 'main'
Pipeline parallelism and inter-layer model parallelism implementation See merge request ADLR/megatron-lm!159
- Loading branch information
Showing
67 changed files
with
3,078 additions
and
1,085 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -32,4 +32,3 @@ python pretrain_bert.py \ | |
--eval-interval 1000 \ | ||
--eval-iters 10 \ | ||
--fp16 | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
#!/bin/bash | ||
|
||
GPUS_PER_NODE=8 | ||
# Change for multinode config | ||
MASTER_ADDR=localhost | ||
MASTER_PORT=6000 | ||
NNODES=1 | ||
NODE_RANK=0 | ||
WORLD_SIZE=$(($GPUS_PER_NODE*$NNODES)) | ||
|
||
DATA_PATH=<Specify path and file prefix>_text_sentence | ||
CHECKPOINT_PATH=<Specify path> | ||
|
||
DISTRIBUTED_ARGS="--nproc_per_node $GPUS_PER_NODE --nnodes $NNODES --node_rank $NODE_RANK --master_addr $MASTER_ADDR --master_port $MASTER_PORT" | ||
|
||
python -m torch.distributed.launch $DISTRIBUTED_ARGS \ | ||
pretrain_bert.py \ | ||
--tensor-model-parallel-size 2 \ | ||
--pipeline-model-parallel-size 2 \ | ||
--num-layers 24 \ | ||
--hidden-size 1024 \ | ||
--num-attention-heads 16 \ | ||
--batch-size 2 \ | ||
--num-microbatches-in-minibatch 2 \ | ||
--seq-length 512 \ | ||
--max-position-embeddings 512 \ | ||
--train-iters 1000000 \ | ||
--save $CHECKPOINT_PATH \ | ||
--load $CHECKPOINT_PATH \ | ||
--data-path $DATA_PATH \ | ||
--vocab-file bert-vocab.txt \ | ||
--data-impl mmap \ | ||
--split 949,50,1 \ | ||
--distributed-backend nccl \ | ||
--lr 0.0001 \ | ||
--lr-decay-style linear \ | ||
--min-lr 1.0e-5 \ | ||
--lr-decay-iters 990000 \ | ||
--weight-decay 1e-2 \ | ||
--clip-grad 1.0 \ | ||
--warmup .01 \ | ||
--log-interval 100 \ | ||
--save-interval 10000 \ | ||
--eval-interval 1000 \ | ||
--eval-iters 10 \ | ||
--fp16 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -38,6 +38,3 @@ python pretrain_gpt2.py \ | |
--eval-interval 1000 \ | ||
--eval-iters 10 \ | ||
--fp16 | ||
|
||
|
||
set +x |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
#! /bin/bash | ||
|
||
# Runs the "345M" parameter model | ||
|
||
GPUS_PER_NODE=8 | ||
# Change for multinode config | ||
MASTER_ADDR=localhost | ||
MASTER_PORT=6000 | ||
NNODES=1 | ||
NODE_RANK=0 | ||
WORLD_SIZE=$(($GPUS_PER_NODE*$NNODES)) | ||
|
||
DATA_PATH=<Specify path and file prefix>_text_document | ||
CHECKPOINT_PATH=<Specify path> | ||
|
||
DISTRIBUTED_ARGS="--nproc_per_node $GPUS_PER_NODE --nnodes $NNODES --node_rank $NODE_RANK --master_addr $MASTER_ADDR --master_port $MASTER_PORT" | ||
|
||
python -m torch.distributed.launch $DISTRIBUTED_ARGS \ | ||
pretrain_gpt2.py \ | ||
--tensor-model-parallel-size 2 \ | ||
--pipeline-model-parallel-size 2 \ | ||
--num-layers 24 \ | ||
--hidden-size 1024 \ | ||
--num-attention-heads 16 \ | ||
--batch-size 4 \ | ||
--num-microbatches-in-minibatch 2 \ | ||
--seq-length 1024 \ | ||
--max-position-embeddings 1024 \ | ||
--train-iters 500000 \ | ||
--lr-decay-iters 320000 \ | ||
--save $CHECKPOINT_PATH \ | ||
--load $CHECKPOINT_PATH \ | ||
--data-path $DATA_PATH \ | ||
--vocab-file gpt2-vocab.json \ | ||
--merge-file gpt2-merges.txt \ | ||
--data-impl mmap \ | ||
--split 949,50,1 \ | ||
--distributed-backend nccl \ | ||
--lr 0.00015 \ | ||
--lr-decay-style cosine \ | ||
--min-lr 1.0e-5 \ | ||
--weight-decay 1e-2 \ | ||
--clip-grad 1.0 \ | ||
--warmup .01 \ | ||
--checkpoint-activations \ | ||
--log-interval 100 \ | ||
--save-interval 10000 \ | ||
--eval-interval 1000 \ | ||
--eval-iters 10 \ | ||
--fp16 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.