Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for HF summarization endpoint in the websearch #319

Merged
merged 2 commits into from
Jul 11, 2023

Conversation

nsarrazin
Copy link
Collaborator

If the user has set an HF_ACCESS_TOKEN we use it to call up an inference endpoint trained for summarization. If the user didn't set their token, we use their LLM endpoint (could be self-hosted w/ no HF token) to try and make a summary the old way.

In my local testing this returns more accurate & faster summaries and it would also help reduce the load on the LLM endpoint. (summarization takes a huge context window which is much larger than most conversations)

I'm not sure if the model I chose is optimal as there are multiple models that support summarization. I'm also not sure if we could get rate limited by the API since the calls for all users would be coming from one server using the prod HF token.

@nsarrazin nsarrazin added enhancement New feature or request back This issue is related to the Svelte backend or the DB labels Jun 22, 2023
@gary149 gary149 self-requested a review June 26, 2023 08:10
Copy link
Collaborator

@gary149 gary149 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, just 2 things:

  • Is this only in English? I'd rather do something that supports multiple languages - using a different model, but I'll let you benchmark.
  • Would it be better to fall back on the actual method if there's an error on inference?

@nsarrazin
Copy link
Collaborator Author

So I tested the multi-language stuff, and even if you ask a question in another language, the LLM often generates a search query in English, and the serpapi settings are configured to fetch results from the US so it's unlikely we'll fetch results that are not in English atm. Still, I tried the model and while it works sometimes, most of the time it generates one-sentence summary that are too short to be useful imo

But good catch on the fallback, I added the feature!

@gary149 gary149 self-requested a review July 11, 2023 10:34
Copy link
Collaborator

@gary149 gary149 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok feel free to merge then :)

@nsarrazin nsarrazin merged commit 10d1ab5 into main Jul 11, 2023
@nsarrazin nsarrazin deleted the feature/use_hf_endpoint_for_summary branch July 11, 2023 10:36
ice91 pushed a commit to ice91/chat-ui that referenced this pull request Oct 30, 2024
…ce#319)

* Add support for HF endpoint for summary

* add fail-safe for summarization
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
back This issue is related to the Svelte backend or the DB enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants