EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty

Li, Yuhui; Wei, Fangyun; Zhang, Chao; Zhang, Hongyang

Computer Science > Machine Learning

arXiv:2401.15077 (cs)

[Submitted on 26 Jan 2024 (v1), last revised 4 Feb 2024 (this version, v2)]

Title:EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty

Authors:Yuhui Li, Fangyun Wei, Chao Zhang, Hongyang Zhang

View PDF

Abstract:Autoregressive decoding makes the inference of Large Language Models (LLMs) time-consuming. In this paper, we reconsider speculative sampling and derive two key observations. Firstly, autoregression at the feature (second-to-top-layer) level is more straightforward than at the token level. Secondly, the inherent uncertainty in feature (second-to-top-layer) level autoregression constrains its performance. Based on these insights, we introduce EAGLE (Extrapolation Algorithm for Greater Language-model Efficiency), a simple yet highly efficient speculative sampling framework. By incorporating a token sequence advanced by one time step, EAGLE effectively resolves the uncertainty, enabling precise second-to-top-layer feature prediction with minimal overhead. We conducted comprehensive evaluations of EAGLE, including all models from the Vicuna and LLaMA2-Chat series, the MoE model Mixtral 8x7B Instruct, and tasks in dialogue, code generation, mathematical reasoning, and instruction following. For LLaMA2-Chat 70B, EAGLE achieved a latency speedup ratio of 2.7x-3.5x, doubled throughput, while maintaining the distribution of the generated text.

Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL)
Cite as:	arXiv:2401.15077 [cs.LG]
	(or arXiv:2401.15077v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2401.15077

Submission history

From: Yuhui Li [view email]
[v1] Fri, 26 Jan 2024 18:59:01 UTC (2,194 KB)
[v2] Sun, 4 Feb 2024 17:18:34 UTC (2,228 KB)

Computer Science > Machine Learning

Title:EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators