veGiantModel

VeGiantModel is a torch based high efficient training library developed by the Applied Machine Learning team at Bytedance. This repository is for ongoing research to make giant model (such as GPT, BERT and T5) training easy, efficient, and effective. VeGiantModel builds on top of Megatron and DeepSpeed, improves communication efficiency by integrating high efficient communication library BytePs and providing customized pipline partitioning.

initialization

import veGiantModel
pipeline_parallel_size = 1
model_parallel_size = 2
veGiantModel.initialize.init_distribute(pipeline_parallel_size, model_parallel_size, init_method="env:https://")
mp_size = veGiantModel.distributed.get_model_parallel_world_size()
dp_size = veGiantModel.distributed.get_data_parallel_world_size()

modules

from veGiantModel.module import ColumnParallelLinear, RowParallelLinear

class PositionWiseFeedForward(nn.Module):
    """ FeedForward Neural Networks for each position """

    def __init__(self, config: Config):
        super().__init__()

        if self.config.use_mp_linear_in_ffn:
            assert ColumnParallelLinear is not None
            assert RowParallelLinear is not None
            self.fc1 = ColumnParallelLinear(config.dim, config.dim_ff, use_ft=False)
            self.fc2 = RowParallelLinear(config.dim_ff, config.dim, use_ft=False)
        else:
            self.fc1 = nn.Linear(config.dim, config.dim_ff)
            self.fc2 = nn.Linear(config.dim_ff, config.dim)
        self.act = Activation(config.act)
        self.dropout = nn.Dropout(config.p_drop_hidden)

    def forward(self, x) -> torch.Tensor:
        # (bsz, seq_len, dim) -> (bsz, seq_len, dim_ff / model_parallel_size) -> (bsz, seq_len, dim)
        fc1_out = self.act(self.fc1(x))
        if self.config.dropout_in_ffn:
            fc1_out = self.dropout(fc1_out)
        fc2_out = self.fc2(fc1_out)
        if self.config.use_ffn_output_dropout:
            fc2_out = self.dropout(fc2_out)
        return fc2_out

Examples

GPT Pretraining

The examples/gpt/pretrain_gpt2_distributed.sh scrips runs 345M parameter GPT pretraining on single 8 GPUs node. It follows largely the same as Megatron GPT script with a few notable differences. It shows good compatiblility with current megatron/Deepseed training job with little changes to adpot VeGiantModel.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
_lib/fastertransformer_training_v1/cu111/torch190		_lib/fastertransformer_training_v1/cu111/torch190
docs		docs
examples/gpt		examples/gpt
src/veGiantModel		src/veGiantModel
third_party		third_party
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

veGiantModel

initialization

modules

Examples

GPT Pretraining

About

Releases

Packages

Languages

License

unix1986/veGiantModel

Folders and files

Latest commit

History

Repository files navigation

veGiantModel

initialization

modules

Examples

GPT Pretraining

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages