Rag fusion rw 002 vector database #3

richardwhiteii · 2023-10-17T04:21:10Z

Implement vector search using Chroma DB, this was the first one I found that I could quickly understand.
I expect it is notional and will later support any vector database.

Initialize Chroma client and collection
Add documents to Chroma collection
Update vector_search() to query Chroma
Perform vector query
Extract document IDs, text, metadata
Build scores with random values
Update reciprocal_rank_fusion()
Take document IDs to map ranks back
Use score values instead of full scores dict
Pass metadatas and documents through pipeline
Return metadatas from vector_search()
Pass metadatas to generate_output()
Update generate_output()
Lookup metadata by document index
Get document text from documents list
Add logging for debugging and transparency

This migrates vector search from random mock data to using the Chroma database. Document text and metadata are retrieved from Chroma and passed through the pipeline. Additional logging provides visibility into the process. Reciprocal rank fusion is updated to work with the Chroma results structure.

Update improves the backend search functionality using a real vector database, while preserving the existing pipeline structure.

TODO:
Better understand vector search to remove "random"
Remove logging
Refactor the functions now that they are larger.

update from main

Description: This commit introduces Chroma, a powerful vector database, to enhance our search functionality. The 'vector_search()' function now performs actual Chroma vector searches, replacing the previous mock database. For each document retrieved from Chroma, random scores are assigned, maintaining our existing scoring mechanism. This integration improves the accuracy and relevance of search results, offering a more robust search experience.

richardwhiteii · 2023-10-17T23:12:31Z

Removed comments and line spacing.

mariozupan · 2023-10-18T10:11:13Z

I would like to see efficiency of rag-fusion on csv(or pdf) financial data tables.
implementation with llama2 or mistral model.

Navanit-git · 2023-10-19T05:13:04Z

I would like to see efficiency of rag-fusion on csv(or pdf) financial data tables.

implementation with llama2 or mistral model.

yes, using financial balance sheet and P/L sheet I want to query data on it.

Raudaschl · 2023-10-20T19:43:12Z

Hey @richardwhiteii
Thank you for submitting this request.
I will be reviewing it over the weekend.

Raudaschl · 2023-11-11T11:29:58Z

Hi @richardwhiteii and @Navanit-git

First off, a huge thanks to both of you for your dedication and hard work on the RAG Fusion project.
Its awesome.

However, I'm a bit concerned about the added complexity, especially considering beginners who might be using this project as a stepping stone in their learning journey.
While the advanced features and modularity are a boon for experienced developers, they could seem daunting for newcomers.
I'd like to highlight a few areas where this complexity could be challenging:

The extensive logging could potentially overshadow the core functionalities we aim to showcase.
Integrating external APIs and databases, albeit powerful, introduces a complexity level that assumes considerable prior knowledge.
The nuanced error handling and environmental variable configurations are indeed best practices but might not be as transparent for those just starting out.

To make this more accessible, I propose:

Incorporating detailed comments and documentation to thoroughly explain each code segment.
Possibly creating a distinct branch or version of the code tailored for beginners.
Offering extra resources or tutorials to aid beginners in grasping the advanced features you've brilliantly put together.

I'd love to hear your thoughts on these suggestions.
My goal is to keep the project approachable for developers of all skill levels, and your insights would be crucial in striking this balance.

Thanks again for your invaluable contribution, and I eagerly await your perspective on making the project more beginner-friendly.

Cheers,
Adrian

Raudaschl · 2023-11-11T11:30:36Z

I would like to see efficiency of rag-fusion on csv(or pdf) financial data tables.

implementation with llama2 or mistral model.

This is a really interesting idea!

richardwhiteii · 2023-11-14T17:07:41Z

I understand. I can bounce some updates your way and let me know what you think. To make sure I'm going in the right direction.

richardwhiteii · 2023-11-19T02:45:38Z

I made some updates specifically I removed the logging and added docstrings and comments.
I added os.environ["TOKENIZERS_PARALLELISM"] = "false" to address a warning I received.

Let me know your thoughts.

How do you envision the branch tailored for beginners looking?

richardwhiteii and others added 4 commits October 16, 2023 00:02

Merge pull request #1 from Raudaschl/master

7fc9184

update from main

Implement vector search using Chroma DB

5e9e7f9

Cleaned up the commented lines and spacing.

4b4d6fe

Raudaschl self-assigned this Nov 11, 2023

richardwhiteii and others added 2 commits November 18, 2023 20:07

Removed logging

b304a57

Removed logging and added comments

e935e6f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rag fusion rw 002 vector database #3

Rag fusion rw 002 vector database #3

richardwhiteii commented Oct 17, 2023

richardwhiteii commented Oct 17, 2023

mariozupan commented Oct 18, 2023

Navanit-git commented Oct 19, 2023

Raudaschl commented Oct 20, 2023

Raudaschl commented Nov 11, 2023 •

edited

Loading

Raudaschl commented Nov 11, 2023

richardwhiteii commented Nov 14, 2023

richardwhiteii commented Nov 19, 2023

Rag fusion rw 002 vector database #3

Are you sure you want to change the base?

Rag fusion rw 002 vector database #3

Conversation

richardwhiteii commented Oct 17, 2023

richardwhiteii commented Oct 17, 2023

mariozupan commented Oct 18, 2023

Navanit-git commented Oct 19, 2023

Raudaschl commented Oct 20, 2023

Raudaschl commented Nov 11, 2023 • edited Loading

Raudaschl commented Nov 11, 2023

richardwhiteii commented Nov 14, 2023

richardwhiteii commented Nov 19, 2023

Raudaschl commented Nov 11, 2023 •

edited

Loading