Length Extrapolation of Transformers: A Survey from the Perspective of Positional Encoding

Zhao, Liang; Feng, Xiaocheng; Feng, Xiachong; Xu, Dongliang; Yang, Qing; Liu, Hongtao; Qin, Bing; Liu, Ting

Computer Science > Computation and Language

arXiv:2312.17044 (cs)

[Submitted on 28 Dec 2023 (v1), last revised 2 Apr 2024 (this version, v4)]

Title:Length Extrapolation of Transformers: A Survey from the Perspective of Positional Encoding

Authors:Liang Zhao, Xiaocheng Feng, Xiachong Feng, Dongliang Xu, Qing Yang, Hongtao Liu, Bing Qin, Ting Liu

View PDF HTML (experimental)

Abstract:Transformer has taken the field of natural language processing (NLP) by storm since its birth. Further, Large language models (LLMs) built upon it have captured worldwide attention due to its superior abilities. Nevertheless, all Transformer-based models including these powerful LLMs suffer from a preset length limit and can hardly generalize from short training sequences to longer inference ones, namely, they can not perform length extrapolation. Hence, a plethora of methods have been proposed to enhance length extrapolation of Transformer, in which the positional encoding (PE) is recognized as the major factor. In this survey, we present these advances towards length extrapolation in a unified notation from the perspective of PE. Specifically, we first introduce extrapolatable PEs, including absolute and relative PEs. Then, we dive into extrapolation methods based on them, covering position interpolation and randomized position methods. Finally, several challenges and future directions in this area are highlighted. Through this survey, We aim to enable the reader to gain a deep understanding of existing methods and provide stimuli for future research.

Comments:	Work in progress
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2312.17044 [cs.CL]
	(or arXiv:2312.17044v4 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2312.17044

Submission history

From: Liang Zhao [view email]
[v1] Thu, 28 Dec 2023 14:42:24 UTC (1,141 KB)
[v2] Fri, 29 Dec 2023 02:29:17 UTC (1,141 KB)
[v3] Mon, 1 Apr 2024 03:03:54 UTC (6,589 KB)
[v4] Tue, 2 Apr 2024 04:56:52 UTC (6,589 KB)

🚨2024-09-29: arxiv.org is experience DB issues. The announce tonight will be 3 hours later than usual.🚨

Computer Science > Computation and Language

Title:Length Extrapolation of Transformers: A Survey from the Perspective of Positional Encoding

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

🚨2024-09-29: arxiv.org is experience DB issues. The announce tonight will be 3 hours later than usual.🚨

Computer Science > Computation and Language

Title:Length Extrapolation of Transformers: A Survey from the Perspective of Positional Encoding

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators