Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support including file attachments in the chat message #957

Merged
merged 46 commits into from
Nov 11, 2024

Conversation

sabaimran
Copy link
Member

@sabaimran sabaimran commented Nov 5, 2024

Now that models have much larger context windows, we can reasonably include full texts of certain files in the messages. Do this when an explicit file filter is set in a conversation. Do so in a separate user message in order to mitigate any confusion in the operation.

Pipe the relevant attached_files context through all methods calling into models.

This breaks certain prior behaviors. We will no longer automatically be processing/generating embeddings on the backend and adding documents to the "brain". You'll have to go to settings and go through the upload documents flow there in order to add docs to the brain (i.e., have search include them during question / response).

Now that models have much larger context windows, we can reasonably include full texts of certain files in the messages. Do this when an explicit file filter is set in a conversation. Do so in a separate user message in order to mitigate any confusion in the operation.

Pipe the relevant attached_files context through all methods calling into models.

We'll want to limit the file sizes for which this is used and provide more helpful UI indicators that this sort of behavior is taking place.
- weave through all subsequent subcalls to models, where relevant, and save to conversation log
- Document is first converted in the chatinputarea, then sent to the chat component. From there, it's sent in the chat API body and then processed by the backend
- We couldn't directly use a UploadFile type in the backend API because we'd have to convert the api type to a multipart form. This would require other client side migrations without uniform benefit, which is why we do it in this two-phase process. This also gives us capacity to repurpose the moe generic interface down the road.
@sabaimran sabaimran changed the title Include full files per query for conversations with file filters Allow to include file attachments in the chat message Nov 8, 2024
@sabaimran sabaimran changed the title Allow to include file attachments in the chat message Support including file attachments in the chat message Nov 8, 2024
@sabaimran sabaimran marked this pull request as ready for review November 8, 2024 01:18
Copy link
Member

@debanjum debanjum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments. But looking forward to being able to easily chat with files without indexing it in my knowledge base!

src/khoj/processor/tools/run_code.py Outdated Show resolved Hide resolved
src/khoj/routers/helpers.py Outdated Show resolved Hide resolved
src/khoj/routers/helpers.py Outdated Show resolved Hide resolved
src/khoj/processor/content/pdf/pdf_to_entries.py Outdated Show resolved Hide resolved
src/khoj/processor/content/docx/docx_to_entries.py Outdated Show resolved Hide resolved
src/interface/web/app/common/chatFunctions.ts Show resolved Hide resolved
src/khoj/routers/api_chat.py Show resolved Hide resolved
src/khoj/routers/api_chat.py Outdated Show resolved Hide resolved
src/khoj/utils/rawconfig.py Show resolved Hide resolved
sabaimran and others added 15 commits November 9, 2024 18:22
Keep function where it original was allows tracking diffs and change
history more easily
Use python standard method tempfile.NamedTemporaryFile to write,
delete temporary files safely.
…b app

Remove unused total size calculations in chat input
Copy link
Member

@debanjum debanjum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left 2 minor comments on PR. But otherwise changes look good to merge

@@ -149,6 +155,7 @@ def converse_anthropic(
query_images: Optional[list[str]] = None,
vision_available: bool = False,
tracer: dict = {},
attached_files: str = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was using query_{x} pattern to indicate it's part of the user query (as opposed to files or images retrieved by Khoj). To me attached files doesn't make it clear where it got attached or who attached the file 😅 . Not sure what would be a better name for the query_images, attached_files args.

But it'd be nice if they have similar names and have attached_files and query_images args next to each other as they have similar origins and expected handling, IMO. But not a blocker

src/interface/web/app/common/iconUtils.tsx Outdated Show resolved Hide resolved
@sabaimran sabaimran merged commit b563f46 into master Nov 11, 2024
9 checks passed
@sabaimran sabaimran deleted the features/include-full-file-in-convo-with-filter branch November 11, 2024 19:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants