LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models

Jiang, Huiqiang; Wu, Qianhui; Lin, Chin-Yew; Yang, Yuqing; Qiu, Lili

Computer Science > Computation and Language

arXiv:2310.05736 (cs)

[Submitted on 9 Oct 2023 (v1), last revised 6 Dec 2023 (this version, v2)]

Title:LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models

Authors:Huiqiang Jiang, Qianhui Wu, Chin-Yew Lin, Yuqing Yang, Lili Qiu

View PDF HTML (experimental)

Abstract:Large language models (LLMs) have been applied in various applications due to their astonishing capabilities. With advancements in technologies such as chain-of-thought (CoT) prompting and in-context learning (ICL), the prompts fed to LLMs are becoming increasingly lengthy, even exceeding tens of thousands of tokens. To accelerate model inference and reduce cost, this paper presents LLMLingua, a coarse-to-fine prompt compression method that involves a budget controller to maintain semantic integrity under high compression ratios, a token-level iterative compression algorithm to better model the interdependence between compressed contents, and an instruction tuning based method for distribution alignment between language models. We conduct experiments and analysis over four datasets from different scenarios, i.e., GSM8K, BBH, ShareGPT, and Arxiv-March23; showing that the proposed approach yields state-of-the-art performance and allows for up to 20x compression with little performance loss. Our code is available at this https URL.

Comments:	Accepted at EMNLP 2023
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2310.05736 [cs.CL]
	(or arXiv:2310.05736v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2310.05736

Submission history

From: Huiqiang Jiang [view email]
[v1] Mon, 9 Oct 2023 14:10:21 UTC (511 KB)
[v2] Wed, 6 Dec 2023 17:02:25 UTC (255 KB)

Computer Science > Computation and Language

Title:LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators