PanGu-{\Sigma}: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing

Ren, Xiaozhe; Zhou, Pingyi; Meng, Xinfan; Huang, Xinjing; Wang, Yadao; Wang, Weichao; Li, Pengfei; Zhang, Xiaoda; Podolskiy, Alexander; Arshinov, Grigory; Bout, Andrey; Piontkovskaya, Irina; Wei, Jiansheng; Jiang, Xin; Su, Teng; Liu, Qun; Yao, Jun

Computer Science > Computation and Language

arXiv:2303.10845 (cs)

[Submitted on 20 Mar 2023]

Title:PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing

Authors:Xiaozhe Ren, Pingyi Zhou, Xinfan Meng, Xinjing Huang, Yadao Wang, Weichao Wang, Pengfei Li, Xiaoda Zhang, Alexander Podolskiy, Grigory Arshinov, Andrey Bout, Irina Piontkovskaya, Jiansheng Wei, Xin Jiang, Teng Su, Qun Liu, Jun Yao

View PDF

Abstract:The scaling of large language models has greatly improved natural language understanding, generation, and reasoning. In this work, we develop a system that trained a trillion-parameter language model on a cluster of Ascend 910 AI processors and MindSpore framework, and present the language model with 1.085T parameters named PanGu-{\Sigma}. With parameter inherent from PanGu-{\alpha}, we extend the dense Transformer model to sparse one with Random Routed Experts (RRE), and efficiently train the model over 329B tokens by using Expert Computation and Storage Separation(ECSS). This resulted in a 6.3x increase in training throughput through heterogeneous computing. Our experimental findings show that PanGu-{\Sigma} provides state-of-the-art performance in zero-shot learning of various Chinese NLP downstream tasks. Moreover, it demonstrates strong abilities when fine-tuned in application data of open-domain dialogue, question answering, machine translation and code generation.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2303.10845 [cs.CL]
	(or arXiv:2303.10845v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2303.10845

Submission history

From: Xiaozhe Ren [view email]
[v1] Mon, 20 Mar 2023 03:39:27 UTC (3,511 KB)

Computer Science > Computation and Language

Title:PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators