-
-
Notifications
You must be signed in to change notification settings - Fork 796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support including file attachments in the chat message #957
Support including file attachments in the chat message #957
Conversation
Now that models have much larger context windows, we can reasonably include full texts of certain files in the messages. Do this when an explicit file filter is set in a conversation. Do so in a separate user message in order to mitigate any confusion in the operation. Pipe the relevant attached_files context through all methods calling into models. We'll want to limit the file sizes for which this is used and provide more helpful UI indicators that this sort of behavior is taking place.
…e-full-file-in-convo-with-filter
…raw text, before further processing
- weave through all subsequent subcalls to models, where relevant, and save to conversation log
- Document is first converted in the chatinputarea, then sent to the chat component. From there, it's sent in the chat API body and then processed by the backend - We couldn't directly use a UploadFile type in the backend API because we'd have to convert the api type to a multipart form. This would require other client side migrations without uniform benefit, which is why we do it in this two-phase process. This also gives us capacity to repurpose the moe generic interface down the road.
…rsation history - When chatting on a shared page, fork and redirect to a new conversation page
…e-full-file-in-convo-with-filter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments. But looking forward to being able to easily chat with files without indexing it in my knowledge base!
…e structured messages
…e-full-file-in-convo-with-filter
Keep function where it original was allows tracking diffs and change history more easily
Use python standard method tempfile.NamedTemporaryFile to write, delete temporary files safely.
…b app Remove unused total size calculations in chat input
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left 2 minor comments on PR. But otherwise changes look good to merge
@@ -149,6 +155,7 @@ def converse_anthropic( | |||
query_images: Optional[list[str]] = None, | |||
vision_available: bool = False, | |||
tracer: dict = {}, | |||
attached_files: str = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was using query_{x}
pattern to indicate it's part of the user query (as opposed to files or images retrieved by Khoj). To me attached files doesn't make it clear where it got attached or who attached the file 😅 . Not sure what would be a better name for the query_images
, attached_files
args.
But it'd be nice if they have similar names and have attached_files
and query_images
args next to each other as they have similar origins and expected handling, IMO. But not a blocker
…hub.com:khoj-ai/khoj into features/include-full-file-in-convo-with-filter
Now that models have much larger context windows, we can reasonably include full texts of certain files in the messages. Do this when an explicit file filter is set in a conversation. Do so in a separate user message in order to mitigate any confusion in the operation.
Pipe the relevant attached_files context through all methods calling into models.
This breaks certain prior behaviors. We will no longer automatically be processing/generating embeddings on the backend and adding documents to the "brain". You'll have to go to settings and go through the upload documents flow there in order to add docs to the brain (i.e., have search include them during question / response).