I-BERT: Integer-only BERT Quantization

Kim, Sehoon; Gholami, Amir; Yao, Zhewei; Mahoney, Michael W.; Keutzer, Kurt

Computer Science > Computation and Language

arXiv:2101.01321v1 (cs)

[Submitted on 5 Jan 2021 (this version), latest version 8 Jun 2021 (v3)]

Title:I-BERT: Integer-only BERT Quantization

Authors:Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer

View PDF

Abstract:Transformer based models, like BERT and RoBERTa, have achieved state-of-the-art results in many Natural Language Processing tasks. However, their memory footprint, inference latency, and power consumption are prohibitive for many edge processors, and it has been a challenge to deploy these models for edge applications and devices that have resource constraints. While quantization can be a viable solution to this, previous work on quantizing Transformer based models uses floating-point arithmetic during inference, thus limiting model deployment on many edge processors. In this work, we propose a novel integer-only quantization scheme for Transformer based models that quantizes the entire inference process. In particular, we demonstrate how to approximate nonlinear operations in Transformer architectures, e.g., GELU, Softmax, and Layer Normalization, with lightweight integer computations. We use those approximations in our method, I-BERT, with an end-to-end integer-only inference, and without any floating point calculation. We test our approach on GLUE downstream tasks using RoBERTa-Base and RoBERTa-Large. For both cases, with an 8-bit integer-only quantization scheme, I-BERT achieves similar accuracy as compared to the full-precision baseline.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2101.01321 [cs.CL]
	(or arXiv:2101.01321v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2101.01321

Submission history

From: Sehoon Kim [view email]
[v1] Tue, 5 Jan 2021 02:42:58 UTC (391 KB)
[v2] Thu, 11 Feb 2021 09:11:11 UTC (772 KB)
[v3] Tue, 8 Jun 2021 07:53:22 UTC (1,218 KB)

Computer Science > Computation and Language

Title:I-BERT: Integer-only BERT Quantization

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:I-BERT: Integer-only BERT Quantization

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators