Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Tokenizer Aware Prompt Builder #6593

Open
sjrl opened this issue Dec 19, 2023 · 9 comments
Open

feat: Tokenizer Aware Prompt Builder #6593

sjrl opened this issue Dec 19, 2023 · 9 comments
Labels
2.x Related to Haystack v2.0 P3 Low priority, leave it in the backlog topic:LLM type:feature New feature or request

Comments

@sjrl
Copy link
Contributor

sjrl commented Dec 19, 2023

Is your feature request related to a problem? Please describe.
For RAG QA often we want to fully utilize the context window of the model by inserting as many retrieved documents as possible. However, it is not easily possible for a user to know ahead of time how many documents they can pass to the LLM without overflowing the context window. Currently this can only be accomplished with trial and error and often times choosing a "correct" top_k is not possible because the documents in a database can vary greatly in length so some queries might cause an overflow and others not depending on the retrieved documents.

Describe the solution you'd like
Therefore, we would like to create a type of Prompt Builder that can truncate some of the inserted variables into the prompt (e.g. truncate the documents but none of the instructions). This would basically amount to calculating a dynamic top_k based on the token count of the retrieved documents. To be able to perform this truncation this Prompt Builder would need to be tokenizer aware.

This would allow users to set a relatively large top_k with the confidence that the most irrelevant documents get removed if they happen to cause the context window to be exceeded. This would provide a more consistent search experience to users since we would no longer run the risk of removing instructions that often come after the inserted documents in the prompt.

@sjrl sjrl added 2.x Related to Haystack v2.0 type:feature New feature or request topic:LLM topic:promptnode and removed topic:promptnode labels Dec 19, 2023
@mathislucka
Copy link
Member

mathislucka commented Dec 22, 2023

Can we reformulate the issue to something like:

"Provide tokenization options to limit document or text length in pipelines"

I could see multiple places where this applies and multiple strategies too.

For documents in a prompt we could:

  • start truncating from the end
  • truncate every document a little so that all of them fit

But it's not only prompts, the same would apply to a ranker too.

@mathislucka mathislucka added the P2 Medium priority, add to the next sprint if no P1 available label Dec 22, 2023
@sjrl
Copy link
Contributor Author

sjrl commented Dec 22, 2023

But it's not only prompts, the same would apply to a ranker too.

Just out of curiosity, what scenario do you have in mind where this would be relevant to have only in the Ranker?

@mathislucka
Copy link
Member

I was actually too fast with that :D

Ranker only gets one document at a time.

@masci masci added P3 Low priority, leave it in the backlog and removed P2 Medium priority, add to the next sprint if no P1 available labels Feb 16, 2024
@medsriha
Copy link
Member

medsriha commented May 2, 2024

I'm going to give this a shot. I'll draft something under PromptTokenAwareBuilder, and report back for feedback.

@CarlosFerLo
Copy link
Contributor

Wouldn't it be nicer just to add a new component that crops context to the amount of tokens needed depending on how you want to do it, defeating docs or just a piece. This way, we don't need to change all the components one by one to adopt this, just add this component to the pipeline.

@mathislucka
Copy link
Member

Yes, came around on that too.

My thoughts were something like:

DocumentsTokenTruncater (probably not a great name) that accepts a list of documents and can truncate them according to different strategies (e.g. truncate left, right, or each).

A TextTokenTruncater could be added for other use cases.

The only problem with that approach would be document metadata that you want to use in the prompt.

Generally, I feel like this is less important now since the context length of most models has increased so much.

@sjrl
Copy link
Contributor Author

sjrl commented Jun 7, 2024

This way, we don't need to change all the components one by one to adopt this, just add this component to the pipeline.

One other (small) issue I forsee when using a separate component is that it's not easily possible to know how much you should truncate the documents by. To precisely know how many tokens the documents should be truncated to requires knowing how many tokens are being used up in the prompt of the PromptBuilder. And that's not easily possible unless the functionality is added to the PromptBuilder.

Generally, I feel like this is less important now since the context length of most models has increased so much.

And yeah I agree with this, this has become less urgent now since context lengths are so large nowadays.

@mathislucka
Copy link
Member

One other (small) issue I forsee when using a separate component is that it's not easily possible to know how much you should truncate the documents by. To precisely know how many tokens the documents should be truncated to requires knowing how many tokens are being used up in the prompt of the PromptBuilder. And that's not easily possible unless the functionality is added to the PromptBuilder.

People could either estimate or count the tokens in their prompt template and then use that to configure the truncater. Not perfect but it would work.

@CarlosFerLo
Copy link
Contributor

One other (small) issue I forsee when using a separate component is that it's not easily possible to know how much you should truncate the documents by. To precisely know how many tokens the documents should be truncated to requires knowing how many tokens are being used up in the prompt of the PromptBuilder. And that's not easily possible unless the functionality is added to the PromptBuilder.

People could either estimate or count the tokens in their prompt template and then use that to configure the truncater. Not perfect but it would work.

We could just add a count tokens method to the prompt template that accepts a tokenizer and returns the number of tokens of the prompt after removing all the jinja stuff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.x Related to Haystack v2.0 P3 Low priority, leave it in the backlog topic:LLM type:feature New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants