FastMoE: A Fast Mixture-of-Expert Training System

He, Jiaao; Qiu, Jiezhong; Zeng, Aohan; Yang, Zhilin; Zhai, Jidong; Tang, Jie

Computer Science > Machine Learning

arXiv:2103.13262v1 (cs)

[Submitted on 24 Mar 2021]

Title:FastMoE: A Fast Mixture-of-Expert Training System

Authors:Jiaao He, Jiezhong Qiu, Aohan Zeng, Zhilin Yang, Jidong Zhai, Jie Tang

View PDF

Abstract:Mixture-of-Expert (MoE) presents a strong potential in enlarging the size of language model to trillions of parameters. However, training trillion-scale MoE requires algorithm and system co-design for a well-tuned high performance distributed training system. Unfortunately, the only existing platform that meets the requirements strongly depends on Google's hardware (TPU) and software (Mesh Tensorflow) stack, and is not open and available to the public, especially GPU and PyTorch communities.
In this paper, we present FastMoE, a distributed MoE training system based on PyTorch with common accelerators. The system provides a hierarchical interface for both flexible model design and easy adaption to different applications, such as Transformer-XL and Megatron-LM. Different from direct implementation of MoE models using PyTorch, the training speed is highly optimized in FastMoE by sophisticated high-performance acceleration skills. The system supports placing different experts on multiple GPUs across multiple nodes, enabling enlarging the number of experts linearly against the number of GPUs. The source of FastMoE is available at this https URL under Apache-2 license.

Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL); Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as:	arXiv:2103.13262 [cs.LG]
	(or arXiv:2103.13262v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2103.13262

Submission history

From: Jiaao He [view email]
[v1] Wed, 24 Mar 2021 15:27:15 UTC (1,514 KB)

Computer Science > Machine Learning

Title:FastMoE: A Fast Mixture-of-Expert Training System

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:FastMoE: A Fast Mixture-of-Expert Training System

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators