LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression

Jiang, Huiqiang; Wu, Qianhui; Luo, Xufang; Li, Dongsheng; Lin, Chin-Yew; Yang, Yuqing; Qiu, Lili

Computer Science > Computation and Language

arXiv:2310.06839 (cs)

[Submitted on 10 Oct 2023 (v1), last revised 12 Aug 2024 (this version, v2)]

Title:LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression

Authors:Huiqiang Jiang, Qianhui Wu, Xufang Luo, Dongsheng Li, Chin-Yew Lin, Yuqing Yang, Lili Qiu

View PDF HTML (experimental)

Abstract:In long context scenarios, large language models (LLMs) face three main challenges: higher computational cost, performance reduction, and position bias. Research indicates that LLM performance hinges on the density and position of key information in the input prompt. Inspired by these findings, we propose LongLLMLingua for prompt compression towards improving LLMs' perception of the key information to simultaneously address the three challenges. Our extensive evaluation across various long context scenarios demonstrates that LongLLMLingua not only enhances performance but also significantly reduces costs and latency. For instance, in the NaturalQuestions benchmark, LongLLMLingua boosts performance by up to 21.4% with around 4x fewer tokens in GPT-3.5-Turbo, leading to substantial cost savings. It achieves a 94.0% cost reduction in the LooGLE benchmark. Moreover, when compressing prompts of about 10k tokens at ratios of 2x-6x, LongLLMLingua can accelerate end-to-end latency by 1.4x-2.6x. Our code is available at this https URL.

Comments:	Accepted at ACL 2024
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2310.06839 [cs.CL]
	(or arXiv:2310.06839v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2310.06839

Submission history

From: Huiqiang Jiang [view email]
[v1] Tue, 10 Oct 2023 17:59:58 UTC (1,417 KB)
[v2] Mon, 12 Aug 2024 03:53:35 UTC (1,981 KB)

Computer Science > Computation and Language

Title:LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators