Skip to content

Commit

Permalink
Refactor everything outside of core to be out of the main megatron. n…
Browse files Browse the repository at this point in the history
…amespace.
  • Loading branch information
jaredcasper committed Mar 26, 2024
1 parent dc7fa88 commit 38644dd
Show file tree
Hide file tree
Showing 159 changed files with 478 additions and 605 deletions.
12 changes: 6 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -157,7 +157,7 @@ The [`examples/pretrain_bert.sh`](./examples/pretrain_bert.sh) script runs singl

The logging, checkpoint-saving, and evaluation interval options are specified. Note that the `--data-path` now includes the additional `_text_sentence` suffix added in preprocessing, but does not include the file extensions.

Further command line arguments are described in the source file [`arguments.py`](./megatron/arguments.py).
Further command line arguments are described in the source file [`arguments.py`](./megatron/training/arguments.py).

To run `examples/pretrain_bert.sh`, make any desired modifications including setting the environment variables for `CHECKPOINT_PATH`, `VOCAB_FILE`, and `DATA_PATH`. Make sure to set these variables to their paths in the container. Then launch the container with Megatron and necessary paths mounted (as explained in [Setup](#setup)) and run the example script.

Expand All @@ -167,7 +167,7 @@ The `examples/pretrain_gpt.sh` script runs single GPU 345M parameter GPT pretrai

It follows largely the same format as the previous BERT script with a few notable differences: the tokenization scheme used is BPE (which requires a merge table and a `json` vocabulary file) instead of WordPiece, the model architecture allows for longer sequences (note that the max position embedding must be greater than or equal to the maximum sequence length), and the `--lr-decay-style` has been set to cosine decay. Note that the `--data-path` now includes the additional `_text_document` suffix added in preprocessing, but does not include the file extensions.

Further command line arguments are described in the source file [`arguments.py`](./megatron/arguments.py).
Further command line arguments are described in the source file [`arguments.py`](./megatron/training/arguments.py).

`examples/pretrain_gpt.sh` can be launched the same way as described for BERT. Set the env vars and make any other modifications, launch the container with appropriate mounts, and run the script.

Expand Down Expand Up @@ -290,7 +290,7 @@ python preprocess_data.py \
--workers 5 # works well for 10 CPU cores. Scale up accordingly.
</pre>
2. Use a custom samples mapping function in place of `megatron/data/realm_dataset_utils.get_block_samples_mapping` if required. To do this, you will need to implement a new function in C++ inside of `megatron/data/helpers.cpp`. The samples mapping data structure is used to select the data that will constitute every training sample in advance of the training loop.
2. Use a custom samples mapping function in place of `megatron/legacy/data/realm_dataset_utils.get_block_samples_mapping` if required. To do this, you will need to implement a new function in C++ inside of `megatron/core/datasets/helpers.cpp`. The samples mapping data structure is used to select the data that will constitute every training sample in advance of the training loop.
The samples mapping is responsible for holding all of the required metadata needed to construct the sample from one or more indexed datasets. In REALM, the samples mapping contains the start and end sentence indices, as well as the document index (to find the correct title for a body) and a unique ID for every block.
3. Pretrain a BERT language model using `pretrain_bert.py`, with the sequence length equal to the block size in token ids. This model should be trained on the same indexed dataset that is used to supply the blocks for the information retrieval task.
In REALM, this is an uncased bert base model trained with the standard hyperparameters.
Expand Down Expand Up @@ -384,7 +384,7 @@ You can also use CURL or any other tools to query the server directly:
curl 'http:https://localhost:5000/api' -X 'PUT' -H 'Content-Type: application/json; charset=UTF-8' -d '{"prompts":["Hello world"], "tokens_to_generate":1}'
</pre>

See [megatron/text_generation_server.py](megatron/text_generation_server.py) for more API options.
See [megatron/inference/text_generation_server.py](megatron/inference/text_generation_server.py) for more API options.

### Detoxify GPT via Self-generation
We include an example in `examples/detxoify_lm/` to detoxify language models by leveraging the generative power of language models.
Expand Down Expand Up @@ -531,10 +531,10 @@ The Llama-2 [family of models](https://ai.meta.com/llama/) are an open-source se
The Llama-2 checkpoints can be loaded into Megatron for inference and finetuning. See documentation [here](docs/llama2.md).

# Model Optimization and Deployment
Megatron-Core (MCore) `GPTModel` family supports advanced quantization algorithms and high-performance deployment through TensorRT-LLM.
Megatron-Core (MCore) `GPTModel` family supports advanced quantization algorithms and high-performance inference through TensorRT-LLM.

## Quantization and TensorRT-LLM Deployment
See [Megatron Model Optimization and Deployment](examples/deploy/README.md) for `llama2` and `nemotron3` examples.
See [Megatron Model Optimization and Deployment](examples/inference/README.md) for `llama2` and `nemotron3` examples.

# Datasets
We do not host any datasets for GPT or BERT training, however, we detail their collection so that our results may be reproduced.
Expand Down
14 changes: 7 additions & 7 deletions examples/detxoify_lm/finetune_gpt.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,19 +10,19 @@
import sys
sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__),
os.path.pardir, os.path.pardir)))
from megatron import get_args
from megatron import get_timers
from megatron import get_tokenizer
from megatron import print_rank_0
from megatron.training import get_args
from megatron.training import get_timers
from megatron.training import get_tokenizer
from megatron.training import print_rank_0
from megatron.core import mpu
from megatron.core.datasets.blended_megatron_dataset_builder import BlendedMegatronDatasetBuilder
from megatron.core.datasets.blended_megatron_dataset_config import GPTDatasetConfig
from megatron.core.datasets.gpt_dataset import GPTDataset
from megatron.model import GPTModel
from megatron.legacy.model import GPTModel
from megatron.core.enums import ModelType
from megatron.training import pretrain
from megatron.utils import get_ltor_masks_and_position_ids
from megatron.utils import average_losses_across_data_parallel_group
from megatron.training.utils import get_ltor_masks_and_position_ids
from megatron.training.utils import average_losses_across_data_parallel_group

def model_provider(pre_process=True, post_process=True):
"""Build the model."""
Expand Down
26 changes: 13 additions & 13 deletions examples/detxoify_lm/generate_samples_gpt.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,24 +9,24 @@
sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__),
os.path.pardir, os.path.pardir)))
import torch
from megatron import get_args
from megatron import get_tokenizer
from megatron import print_rank_0
from megatron.checkpointing import load_checkpoint
from megatron.training import get_args
from megatron.training import get_tokenizer
from megatron.training import print_rank_0
from megatron.training.checkpointing import load_checkpoint
from megatron.core import mpu
from megatron.initialize import initialize_megatron
from megatron.model import GPTModel
from megatron.training.initialize import initialize_megatron
from megatron.legacy.model import GPTModel
from megatron.training import get_model
from megatron.text_generation import generate_and_post_process
from megatron.arguments import core_transformer_config_from_args
from megatron.inference.text_generation import generate_and_post_process
from megatron.training.arguments import core_transformer_config_from_args
from megatron.core.models.gpt import GPTModel
from typing import Union
import megatron.model
import megatron.legacy.model
from megatron.core.transformer.spec_utils import import_module
from megatron.arguments import core_transformer_config_from_args
from megatron.training.arguments import core_transformer_config_from_args
from megatron.core.models.gpt.gpt_layer_specs import get_gpt_layer_with_transformer_engine_spec, get_gpt_layer_local_spec

def model_provider(pre_process=True, post_process=True) -> Union[GPTModel, megatron.model.GPTModel]:
def model_provider(pre_process=True, post_process=True) -> Union[GPTModel, megatron.legacy.model.GPTModel]:
"""Builds the model.
If you set the use_mcore_models to True, it will return the mcore GPT model and if not the legacy GPT model.
Expand All @@ -37,7 +37,7 @@ def model_provider(pre_process=True, post_process=True) -> Union[GPTModel, megat
Returns:
Union[GPTModel, megatron.model.GPTModel]: The returned model
Union[GPTModel, megatron.legacy.model.GPTModel]: The returned model
"""
args = get_args()

Expand Down Expand Up @@ -83,7 +83,7 @@ def model_provider(pre_process=True, post_process=True) -> Union[GPTModel, megat
else:
assert(args.context_parallel_size == 1), "Context parallelism is only supported with Megatron Core!"

model = megatron.model.GPTModel(
model = megatron.legacy.model.GPTModel(
config,
num_tokentypes=0,
parallel_output=True,
Expand Down
6 changes: 3 additions & 3 deletions examples/deploy/README.md → examples/inference/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ following checkpoint formats with some remedy:

| GPTModel | sharded | remedy arguments |
|-----------------------------------|---------|-----------------------------------------|
| megatron.model | | `--ammo-load-classic-megatron-to-mcore` |
| megatron.legacy.model | | `--ammo-load-classic-megatron-to-mcore` |
| TE-Fused (default mcore gpt spec) | | `--ammo-convert-te-to-local-spec` |
| TE-Fused (default mcore gpt spec) | x | |

Expand Down Expand Up @@ -76,7 +76,7 @@ cd ..

Now launch the PTQ + TensorRT-LLM export script,
```
bash examples/deploy/ptq_trtllm_nemotron3_8b ./nemotron-3-8b-base-4k None
bash examples/inference/ptq_trtllm_nemotron3_8b ./nemotron-3-8b-base-4k None
```
By default, `cnn_dailymail` is used for calibration. The `GPTModel` will have quantizers for simulating the
quantization effect. The checkpoint will be saved optionally (with quantizers as additional states) and can
Expand Down Expand Up @@ -112,7 +112,7 @@ The script expects `${CHECKPOINT_DIR}` (`./nemotron-3-8b-base-4k`) to have the f
> that we support.
```sh
bash examples/deploy/ptq_trtllm_llama_7b.sh ${CHECKPOINT_DIR}
bash examples/inference/ptq_trtllm_llama_7b.sh ${CHECKPOINT_DIR}
```

The script expect `${CHECKPOINT_DIR}` to have the following structure:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ python -c "import ammo.torch.quantization.extensions as ext; print(ext.cuda_ext)
launch_config="--nproc_per_node=${TP}"

# Launch multi-process with torchrun
torchrun ${launch_config} examples/deploy/text_generation_ptq.py ${options} ${additional_options} --load ${CHECKPOINT_LOAD_DIR}
torchrun ${launch_config} examples/inference/text_generation_ptq.py ${options} ${additional_options} --load ${CHECKPOINT_LOAD_DIR}

# This script is using mpi4py which will fork multiple processes.
python examples/deploy/trtllm_text_generation.py ${trtllm_options}
python examples/inference/trtllm_text_generation.py ${trtllm_options}
Original file line number Diff line number Diff line change
Expand Up @@ -68,8 +68,8 @@ python -c "import ammo.torch.quantization.extensions as ext; print(ext.cuda_ext)
launch_config="--nproc_per_node=${TP}"

# Launch multi-process with torchrun
torchrun ${launch_config} examples/deploy/text_generation_ptq.py ${options} ${additional_options} --load ${CHECKPOINT_LOAD_DIR}
torchrun ${launch_config} examples/inference/text_generation_ptq.py ${options} ${additional_options} --load ${CHECKPOINT_LOAD_DIR}

# This script is using mpi4py which will fork multiple processes.
python examples/deploy/trtllm_text_generation.py ${trtllm_options}
python examples/inference/trtllm_text_generation.py ${trtllm_options}

Original file line number Diff line number Diff line change
Expand Up @@ -13,16 +13,16 @@
from datasets import load_dataset

# [ModelOpt]: changing the default model provider to the AMMO version
from megatron import get_args, print_rank_0
from megatron.checkpointing import load_checkpoint, save_checkpoint
from megatron.training import get_args, print_rank_0
from megatron.training.checkpointing import load_checkpoint, save_checkpoint
from megatron.core import mpu
from megatron.core.dist_checkpointing import load
from megatron.deploy.arguments import add_ammo_args
from megatron.deploy.gpt.model_provider import model_provider
from megatron.initialize import initialize_megatron
from megatron.text_generation import generate_and_post_process
from megatron.inference.arguments import add_ammo_args
from megatron.inference.gpt.model_provider import model_provider
from megatron.training.initialize import initialize_megatron
from megatron.inference.text_generation import generate_and_post_process
from megatron.training import get_model
from megatron.utils import unwrap_model
from megatron.training.utils import unwrap_model

QUANT_CFG_CHOICES = {
"int8": atq.INT8_DEFAULT_CFG,
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,10 @@

from typing import Union

from megatron import get_args, print_rank_0
from megatron.arguments import core_transformer_config_from_args
from megatron.core.deploy.gpt.model_specs import get_gpt_layer_ammo_spec
from megatron.core.deploy.gpt.state_dict_hooks import (
from megatron.training import get_args, print_rank_0
from megatron.training.arguments import core_transformer_config_from_args
from megatron.core.inference.gpt.model_specs import get_gpt_layer_ammo_spec
from megatron.core.inference.gpt.state_dict_hooks import (
mcore_gpt_load_classic_state_dict_pre_hook,
mcore_gpt_load_te_state_dict_pre_hook,
)
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

import torch

from megatron import get_args
from megatron.training import get_args
from megatron.core import mpu, InferenceParams
from .communication import (
send_to_next_pipeline_rank,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@
import torch
import torch.nn.functional as F

from megatron import get_args, get_tokenizer
from megatron.training import get_args, get_tokenizer
from megatron.core import mpu
from megatron.utils import get_ltor_masks_and_position_ids
from megatron.training.utils import get_ltor_masks_and_position_ids
from .communication import (
copy_from_last_to_first_pipeline_stage,
broadcast_from_last_pipeline_stage,
Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
import torch


from megatron import get_tokenizer, get_args
from megatron.training import get_tokenizer, get_args
from .communication import broadcast_int_list, broadcast_tensor


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@
import threading
from flask import Flask, request, jsonify, current_app
from flask_restful import Resource, Api
from megatron import get_args
from megatron.text_generation import generate_and_post_process
from megatron.text_generation import beam_search_and_post_process
from megatron.training import get_args
from megatron.inference.text_generation import generate_and_post_process
from megatron.inference.text_generation import beam_search_and_post_process


GENERATE_NUM = 0
Expand Down
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,11 @@
import numpy as np
import torch

from megatron import get_args, get_tokenizer, print_rank_0
from megatron.training import get_args, get_tokenizer, print_rank_0
from megatron.core import mpu, tensor_parallel
from megatron.data.dataset_utils import create_masked_lm_predictions, \
from megatron.legacy.data.dataset_utils import create_masked_lm_predictions, \
pad_and_convert_to_numpy
from megatron.data.data_samplers import MegatronPretrainingSampler
from megatron.legacy.data.data_samplers import MegatronPretrainingSampler

def make_attention_mask(source_block, target_block):
"""
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
import torch
import numpy as np
from torch.utils.data import Dataset
from megatron import get_args
from megatron.training import get_args
from megatron.core import mpu


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
import numpy as np
import torch

from megatron import (
from megatron.training import (
get_args,
print_rank_0
)
Expand Down Expand Up @@ -535,8 +535,8 @@ def build_dataset(name, data_prefix, max_num_samples,
max_seq_length_dec, dataset_type='standard_bert',
indexed_dataset=None):

from megatron.data.ict_dataset import ICTDataset
from megatron.data.multimodal_dataset import MultiModalDataset
from megatron.legacy.data.ict_dataset import ICTDataset
from megatron.legacy.data.multimodal_dataset import MultiModalDataset

if dataset_type == DSET_TYPE_BERT or dataset_type == DSET_TYPE_T5:
raise ValueError("The Megatron-LM BERT and T5 datasets are deprecated.")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,10 @@
import numpy as np
from torch.utils.data import Dataset

from megatron import get_tokenizer
from megatron import get_args
from megatron.data.dataset_utils import get_indexed_dataset_
from megatron.data.realm_dataset_utils import get_block_samples_mapping
from megatron.training import get_tokenizer
from megatron.training import get_args
from megatron.legacy.data.dataset_utils import get_indexed_dataset_
from megatron.legacy.data.realm_dataset_utils import get_block_samples_mapping

def make_attention_mask(source_block, target_block):
"""
Expand Down
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,9 @@
import torch
from torch.utils.data import Dataset

from megatron import print_rank_0, get_args, get_tokenizer
from megatron.training import print_rank_0, get_args, get_tokenizer
from megatron.core import tensor_parallel
from megatron.data.biencoder_dataset_utils import make_attention_mask
from megatron.legacy.data.biencoder_dataset_utils import make_attention_mask

def get_open_retrieval_wiki_dataset():
args = get_args()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,10 @@
import numpy as np
import torch

from megatron import print_rank_0
from megatron.training import print_rank_0
from megatron.core import mpu, tensor_parallel
from megatron.data.dataset_utils import create_masked_lm_predictions, pad_and_convert_to_numpy
from megatron import get_args, get_tokenizer, print_rank_0
from megatron.legacy.data.dataset_utils import create_masked_lm_predictions, pad_and_convert_to_numpy
from megatron.training import get_args, get_tokenizer, print_rank_0


def get_one_epoch_dataloader(dataset, micro_batch_size=None):
Expand All @@ -24,7 +24,7 @@ def get_one_epoch_dataloader(dataset, micro_batch_size=None):
sampler = torch.utils.data.SequentialSampler(dataset)
# importantly, drop_last must be False to get all the data.
assert False, 'DistributedBatchSampler deprecated, change the implementation'
from megatron.data.samplers import DistributedBatchSampler
from megatron.legacy.data.samplers import DistributedBatchSampler
batch_sampler = DistributedBatchSampler(sampler,
batch_size=global_batch_size,
drop_last=False,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
import numpy as np
import torch

from megatron import get_args
from megatron.training import get_args
from megatron.core import mpu


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@
import torch
import torchvision.transforms as T
from torchvision import datasets
from megatron import get_args
from megatron.data.image_folder import ImageFolder
from megatron.data.autoaugment import ImageNetPolicy
from megatron.data.data_samplers import RandomSeedDataset
from megatron.training import get_args
from megatron.legacy.data.image_folder import ImageFolder
from megatron.legacy.data.autoaugment import ImageNetPolicy
from megatron.legacy.data.data_samplers import RandomSeedDataset
from PIL import Image, ImageFilter, ImageOps


Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,11 @@
import torch
from torch.nn import LayerNorm

from megatron.model.enums import AttnMaskType
from megatron.model.fused_layer_norm import MixedFusedLayerNorm
from megatron.model.fused_softmax import FusedScaleMaskSoftmax
from megatron.model.utils import attention_mask_func
from megatron.fused_kernels import load
from megatron.legacy.model.enums import AttnMaskType
from megatron.legacy.model.fused_layer_norm import MixedFusedLayerNorm
from megatron.legacy.model.fused_softmax import FusedScaleMaskSoftmax
from megatron.legacy.model.utils import attention_mask_func
from megatron.legacy.fused_kernels import load

def test_load_fused_kernels():
try:
Expand Down
File renamed without changes.
Loading

0 comments on commit 38644dd

Please sign in to comment.