- [2024/7] The first version of AICB is released. This version supports simulating workloads of GPT series, LLaMA, and Moe using the training frameworks of Megatron and DeepSpeed.
- Lastest News
- Table of Contents
- AICB Overview
- Setup
- Usage
- Tutorial
- Projects using AICB
AICB (Artificial Intelligence Communication Benchmark), is a novel benchmark suite for evaluating the communication system of a realistic and emulated GPU cluster from the pespectives of the emerging training and inference applications. Different from exisiting network benchmarks, AICB is designed to produce the communication workloads with precise patterns that are aligned to real-world applications. Taking the Large Language Model (LLM) training as an example, the workloads vary with the complicated combinations of models, parallel frameworks, and parameters of the models, parallel frameworks, and the collective communication libraries. In general, the scenarios suitable for using AICB include but not limited to 1) benchmarking and tuning of the communication system of a GPU cluster, 2) investigating and analyzing the communication patterns of specific application settings, 3) tools, e.g. simulators, that need workloads which are well described.
There are a lot of parameters that influence the communication and computation patterns, which are (1) model parameters (e.g., hidden_size, num_layers, seq_len, etc.) and (2) framework parameters (e.g., world size, parallelization strategies (TP, PP, DP, SP), zero level, reduce_bucket_size/allgather_bucket_size, etc.). For the sake of generality, we cover those typical settings using a smallest set of benchmarks rather than traversing all the combinations. To this end, we propose the benchmark suite as listed in the following table. Users can directly run all the selected workloads selected in AICB, or run part of the workloads, or even generate their own workloads. For more detailed information, please refer to AICB_workload spec v1.0.
id | Name | Parameter_size | Hidden_size | Num_of_layers | Attention_heads | Sequence_length | FFN_hidden_size |
---|---|---|---|---|---|---|---|
1 | GPT_7B | 7B | 4096 | 32 | 32 | 2048 | 16384 |
2 | GPT_13B | 13B | 5120 | 40 | 32 | 2048 | 20480 |
3 | GPT_22B | 22B | 6144 | 48 | 64 | 2048 | 24576 |
4 | GPT_175B | 175B | 12288 | 96 | 96 | 2048 | 49152 |
5 | GPT_13B | 13B | 5120 | 40 | 32 | 2048 | 20480 |
6 | LLaMA_7B | 7B | 4096 | 32 | 32 | 4096 | 11008 |
7 | LLaMA_7B | 7B | 4096 | 32 | 32 | 4096 | 11008 |
8 | LLaMA_65B | 65B | 8192 | 80 | 64 | 4096 | 28672 |
9 | LLaMA_65B | 65B | 8192 | 80 | 64 | 4096 | 28672 |
10 | Mistral_8*7B | 56B | 4096 | 32 | 32 | 1024 | 14336 |
You can follow the instrucitons below to quickly set up the environtments and run AICB.
-
Installation from source code
a. To initiate actual communication tasks, ensure that the runtime environment has all necessary dependencies, such as CUDA and PyTorch, already installed. For specific usage examples, see Physical Execution
b. To generate workload traffic patterns for large model parallel framework training, you can use a CPU-only environment. For specific usage examples, see Generate Workloads
-
Installation from deb package (for Ubuntu systems)
Currently, you can install the deb package on an NV-built NGC container NGC's PyTorch container to start running AICB.
docker pull nvcr.io/nvidia/pytorch:xx.xx-py3 docker run --gpus all -it --rm -v /path/to/AICBench:/workspace/AICBench nvcr.io/nvidia/pytorch:xx.xx-py3 dpkg -i /download/AICB_v1.0.deb sh megatron_workload_with_aiob.sh -m 7
-
Composing a Docker image from Dockfile
You can launch an instance of the Docker container with Dockerfile for quick start:
docker build -t image:latest . docker run --gpus all -it --rm image:latest
After installation, we provide three main usage scenarios for AICB:
- Running on physical GPU clusters
- Generating workload descrption files for simulation
- Customized parameters.
There is a tutorial including all the details, please refer to the tutorial.
For running AICB on a physical machine, we provide both scripts for quick start and methods for executing custom cases.
When running on a physical machine, additional configuration of environment variables required by PyTorch is necessary.
--nnodes Number of nodes: $WORLD_SIZE
--node_rank Rank of the node: $RANK
--nproc_per_node Number of GPUs per node: $NUM_GPUS
--master_addr Master address: $MASTER_ADDR
--master_port Master port: $MASTER_PORT
You can directly execute all the test cases provided in our AICB workload specification v1.0 in physical GPU cluster by utilizing the run_suites script. This script ensures that all parallel framworks are covered, allowing you to validate and analyze the performance and behavior of various workloads efficiently.
For the Megatron parallel framework
, you can quickly start using the scripts/megatron_gpt.sh script file.
sh scripts/megatron_gpt.sh \
-m 7 --world_size 8 --tp_num 2 --pp_num 1 \
--comm_frame Megatron --global_batch 16 \
--micro_batch 1 --seq_length 2048
For Moe
, you can quickly start it using the scripts/megatron_gpt.sh script file.
sh scripts/megatron_gpt.sh \
-m moe --world_size 8 --tp_num 2 --pp_num 1
--moe_enabled --expert_parallel_size 1 \
--comm_frame Megatron --global_batch 16 \
--num_moe_experts 2 --moe_router_topk 2 \
--micro_batch 1 --grouped_gemm
For the DeepSpeed
parallel framework, you can quickly start it using the scripts/deepspeed_llama.sh script file. Currently, the DeepSpeed framework does not support --aiob_enable
or --comp_filepath
, but you can choose to use fixed computation times (please refer to the tutorial).
sh scripts/deepspeed_llama.sh \
--zero_stage 3 -m 65 --epoch_num 100 \
--reduce_bucket_size=1000000000 --allgather_bucket_size=500000000 \
--param_persistence_threshold=1000000 \
To mirror the real-world workloads with both computation and communicaiton, we developed a sub-module, AIOB, that is used to generate computation patterns. In AICB, we can enable AIOB to embed the computation time into the workloads.
For the Megatron parallel framework, the --aiob_enable
option allows for capturing the computation time of each operation in the actual model.
If we do not set --aiob_enable
, only fixed computation times can be applied. (Please refer to the tutorial)
- Running workloads with computation times generated by AIOB. After running, we can get an extra computation desrcription file describing the computation times for the main computation kernels in the directory of
results/aiob_outputs
. Note that the computation times are obtained through the execution of computation kernels on the specific GPU. The following commands does not generate the computation descrition file, but also run the workload in the real GPU cluster.
sh scripts/megatron_gpt.sh \
-m 7 --world_size 8 --tp_num 2 --pp_num 1 \
--comm_frame Megatron --global_batch 16 \
--micro_batch 1 --seq_length 2048 \
--swiglu --use_flash_attn --aiob_enable
- Running workload with computation time through an existing computation decription file.
Users can defined their own computation times or directly use the files we provided.
By specifying the computation description file with the
--comp_filepath
option, you can embed computation times before running the workload on a physical machine.
sh scripts/megatron_gpt.sh \
-m 7 --world_size 8 --tp_num 2 --pp_num 1 \
--comm_frame Megatron --global_batch 16 --micro_batch 1 \
--seq_length 2048 --swiglu --use_flash_attn \
--aiob_enable \
--comp_filepath workload/aiob_inputs/Example.txt
In addition to running the AICB in the GPU clusters, AICB also generates the workload description files which can be used for simulation or further analysis. In this release, we provide scripts for quickly generating workloads for SimAI.
You can generate all the workload description files with generate_suite as specified in our AICB workload spec v1.0. Once these files are created, you can execute them using the SimAI to test and analyze various scenarios.
Here, you can use the script scripts/megatron_workload.sh and the parameter --model_size
(7/13/22/175/moe) to generate the corresponding workload description file. For the computation part of the model, you can choose to enable AIOB by using the --aiob_enable
option. If AIOB is not used, the Workload will be filled with a fixed computation time by default.
- Generating workload description files with computation times generated by AIOB.
sh ./scripts/megatron_workload_with_aiob.sh \
-m 7 --world_size 4096 \
--tp_num 2 --pp_num 1 \
--comm_frame Megatron --global_batch 8192 \
--micro_batch 1 --seq_length 4096 \
--swiglu --use_flash_attn --aiob_enable
- Generating workload description files with computation time through an existing computation decription file.
sh ./scripts/megatron_workload_with_aiob.sh -m 7 \
--world_size 4096 --tp_num 2 --pp_num 1 \
--comm_frame Megatron --global_batch 8192 \
--micro_batch 1 --seq_length 4096 --swiglu \
--use_flash_attn --aiob_enable \
--comp_filepath workload/aiob_inputs/Example.txt
For the Moe, you can also use scripts/megatron_workload_with_aiob.sh to generate the corresponding model's workload file.
sh scripts/workload_moe.sh \
-mmoe --world_size 4096 --tp_num 2 --pp_num 1 --sp --expert_parallel_size 1 \
--num_moe_experts 2 --moe_router_topk 2 \
--comm_frame Megatron --global_batch 8192 \
--micro_batch 1 --seq_length 1024 --swiglu \
--use_flash_attn --aiob_enable \
--comp_filepath workload/aiob_inputs/Example.txt
For the DeepSpeed
parallel framework, you can use scripts/workload_deepspeed.sh to generate the corresponding workload description file.
sh ./scripts/deepspeed_llama.sh -m 7
In addition to quick start, you can also customize the model parameters in detail to run on physical clusters or generate the required workloads for simulation and analysis. For more detailed parameter descriptions and more Example, please refer to the tutorial.
The current entry file for running custom cases is aicb.py. By using this file, you can flexibly choose more parameters for tuning.
# Megatron Example
torchrun \
--nnodes $WORLD_SIZE \
--node_rank $RANK \
--nproc_per_node gpu \
--master_addr $MASTER_ADDR \
--master_port $MASTER_PORT \
./aicb.py --comm_frame=Megatron --world_size=$((WORLD_SIZE*8)) --tp_num=$tp_num \
--micro_batch=$batch_size --global_batch=$((WORLD_SIZE*8*batch_size/tp_num)) --epoch_num=$epoch_num \
--num_layers=$num_layers --hidden_size=$hidden_size --ffn_hidden_size=$ffn_hidden_size --num_attention_heads=$num_attention_heads \
$sp_enable --seq_len=$seq_len --vocab_size=$vocab_size --aiob_enable=$enable
Similarly, when generating workloads, you can also customize the model training parameters and modifying the generated files to generate your own workload file for simulation. This can be achieved by using the following files: generate custom description file
Here is an example:
python -m workload_generator.AIOB_simAI_workload_generator \
--world_size=32 --global_batch=64 --micro_batch=1 \
--num_layers=8 --num_attention_heads=176 --hidden_size=5120 \
--tp_num=2 --seq_length=4096 --swiglu --ffn_hidden_size=16384 \
--moe_router_topk=4 --enable_sequence_parallel --expert_parallel_size=16 \
--num_moe_experts=64 --moe_grouped_gemm --moe_enabled
We provide a tutorial for users to quickly get started with AICB. the tutorial
Below are some of the projects where we have directly used AICB:
- AICB is part of the SimAI project which is led by Alibaba Cloud. Researchers who use AICB can cite our paper "SimAI: Unifying Architecture Design and Performance Tunning for Large-Scale Large Language Model Training with Scalability and Precision" (NSDI’25).