feat: Tokenizer Aware Prompt Builder #6593

sjrl · 2023-12-19T15:28:54Z

Is your feature request related to a problem? Please describe.
For RAG QA often we want to fully utilize the context window of the model by inserting as many retrieved documents as possible. However, it is not easily possible for a user to know ahead of time how many documents they can pass to the LLM without overflowing the context window. Currently this can only be accomplished with trial and error and often times choosing a "correct" top_k is not possible because the documents in a database can vary greatly in length so some queries might cause an overflow and others not depending on the retrieved documents.

Describe the solution you'd like
Therefore, we would like to create a type of Prompt Builder that can truncate some of the inserted variables into the prompt (e.g. truncate the documents but none of the instructions). This would basically amount to calculating a dynamic top_k based on the token count of the retrieved documents. To be able to perform this truncation this Prompt Builder would need to be tokenizer aware.

This would allow users to set a relatively large top_k with the confidence that the most irrelevant documents get removed if they happen to cause the context window to be exceeded. This would provide a more consistent search experience to users since we would no longer run the risk of removing instructions that often come after the inserted documents in the prompt.

mathislucka · 2023-12-22T08:18:09Z

Can we reformulate the issue to something like:

"Provide tokenization options to limit document or text length in pipelines"

I could see multiple places where this applies and multiple strategies too.

For documents in a prompt we could:

start truncating from the end
truncate every document a little so that all of them fit

~~But it's not only prompts, the same would apply to a ranker too.~~

sjrl · 2023-12-22T08:43:49Z

But it's not only prompts, the same would apply to a ranker too.

Just out of curiosity, what scenario do you have in mind where this would be relevant to have only in the Ranker?

mathislucka · 2023-12-22T09:39:24Z

I was actually too fast with that :D

Ranker only gets one document at a time.

medsriha · 2024-05-02T22:34:16Z

I'm going to give this a shot. I'll draft something under PromptTokenAwareBuilder, and report back for feedback.

CarlosFerLo · 2024-06-06T19:49:58Z

Wouldn't it be nicer just to add a new component that crops context to the amount of tokens needed depending on how you want to do it, defeating docs or just a piece. This way, we don't need to change all the components one by one to adopt this, just add this component to the pipeline.

mathislucka · 2024-06-07T07:12:13Z

Yes, came around on that too.

My thoughts were something like:

DocumentsTokenTruncater (probably not a great name) that accepts a list of documents and can truncate them according to different strategies (e.g. truncate left, right, or each).

A TextTokenTruncater could be added for other use cases.

The only problem with that approach would be document metadata that you want to use in the prompt.

Generally, I feel like this is less important now since the context length of most models has increased so much.

sjrl · 2024-06-07T07:25:16Z

This way, we don't need to change all the components one by one to adopt this, just add this component to the pipeline.

One other (small) issue I forsee when using a separate component is that it's not easily possible to know how much you should truncate the documents by. To precisely know how many tokens the documents should be truncated to requires knowing how many tokens are being used up in the prompt of the PromptBuilder. And that's not easily possible unless the functionality is added to the PromptBuilder.

Generally, I feel like this is less important now since the context length of most models has increased so much.

And yeah I agree with this, this has become less urgent now since context lengths are so large nowadays.

mathislucka · 2024-06-07T11:00:41Z

One other (small) issue I forsee when using a separate component is that it's not easily possible to know how much you should truncate the documents by. To precisely know how many tokens the documents should be truncated to requires knowing how many tokens are being used up in the prompt of the PromptBuilder. And that's not easily possible unless the functionality is added to the PromptBuilder.

People could either estimate or count the tokens in their prompt template and then use that to configure the truncater. Not perfect but it would work.

CarlosFerLo · 2024-06-07T11:06:38Z

One other (small) issue I forsee when using a separate component is that it's not easily possible to know how much you should truncate the documents by. To precisely know how many tokens the documents should be truncated to requires knowing how many tokens are being used up in the prompt of the PromptBuilder. And that's not easily possible unless the functionality is added to the PromptBuilder.

People could either estimate or count the tokens in their prompt template and then use that to configure the truncater. Not perfect but it would work.

We could just add a count tokens method to the prompt template that accepts a tokenizer and returns the number of tokens of the prompt after removing all the jinja stuff.

sjrl added 2.x Related to Haystack v2.0 type:feature New feature or request topic:LLM topic:promptnode and removed topic:promptnode labels Dec 19, 2023

mathislucka added the P2 Medium priority, add to the next sprint if no P1 available label Dec 22, 2023

masci mentioned this issue Jan 24, 2024

PipelineRuntimeError in rag_custom_data example due to exceeding max token length #6761

Closed

1 task

masci added P3 Low priority, leave it in the backlog and removed P2 Medium priority, add to the next sprint if no P1 available labels Feb 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Tokenizer Aware Prompt Builder #6593

feat: Tokenizer Aware Prompt Builder #6593

sjrl commented Dec 19, 2023

mathislucka commented Dec 22, 2023 •

edited

Loading

sjrl commented Dec 22, 2023

mathislucka commented Dec 22, 2023

medsriha commented May 2, 2024 •

edited

Loading

CarlosFerLo commented Jun 6, 2024

mathislucka commented Jun 7, 2024

sjrl commented Jun 7, 2024

mathislucka commented Jun 7, 2024

CarlosFerLo commented Jun 7, 2024

feat: Tokenizer Aware Prompt Builder #6593

feat: Tokenizer Aware Prompt Builder #6593

Comments

sjrl commented Dec 19, 2023

mathislucka commented Dec 22, 2023 • edited Loading

sjrl commented Dec 22, 2023

mathislucka commented Dec 22, 2023

medsriha commented May 2, 2024 • edited Loading

CarlosFerLo commented Jun 6, 2024

mathislucka commented Jun 7, 2024

sjrl commented Jun 7, 2024

mathislucka commented Jun 7, 2024

CarlosFerLo commented Jun 7, 2024

mathislucka commented Dec 22, 2023 •

edited

Loading

medsriha commented May 2, 2024 •

edited

Loading