LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression

Jiang, Huiqiang; Wu, Qianhui; Luo, Xufang; Li, Dongsheng; Lin, Chin-Yew; Yang, Yuqing; Qiu, Lili

Computer Science > Computation and Language

arXiv:2310.06839v1 (cs)

[Submitted on 10 Oct 2023 (this version), latest version 12 Aug 2024 (v2)]

Title:LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression

Authors:Huiqiang Jiang, Qianhui Wu, Xufang Luo, Dongsheng Li, Chin-Yew Lin, Yuqing Yang, Lili Qiu

View PDF

Abstract:In long context scenarios, large language models (LLMs) face three main challenges: higher computational/financial cost, longer latency, and inferior performance. Some studies reveal that the performance of LLMs depends on both the density and the position of the key information (question relevant) in the input prompt. Inspired by these findings, we propose LongLLMLingua for prompt compression towards improving LLMs' perception of the key information to simultaneously address the three challenges. We conduct evaluation on a wide range of long context scenarios including single-/multi-document QA, few-shot learning, summarization, synthetic tasks, and code completion. The experimental results show that LongLLMLingua compressed prompt can derive higher performance with much less cost. The latency of the end-to-end system is also reduced. For example, on NaturalQuestions benchmark, LongLLMLingua gains a performance boost of up to 17.1% over the original prompt with ~4x fewer tokens as input to GPT-3.5-Turbo. It can derive cost savings of \$28.5 and \$27.4 per 1,000 samples from the LongBench and ZeroScrolls benchmark, respectively. Additionally, when compressing prompts of ~10k tokens at a compression rate of 2x-10x, LongLLMLingua can speed up the end-to-end latency by 1.4x-3.8x. Our code is available at this https URL.

Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2310.06839 [cs.CL]
	(or arXiv:2310.06839v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2310.06839

Submission history

From: Huiqiang Jiang [view email]
[v1] Tue, 10 Oct 2023 17:59:58 UTC (1,417 KB)
[v2] Mon, 12 Aug 2024 03:53:35 UTC (1,981 KB)

Computer Science > Computation and Language

Title:LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators