This project demonstrates the use of Pinecone for creating a vector embedding database and Langchain for handling language operations with OpenAI's ChatModel. The goal is to create a question answering chatbot from PDFs, create an index, and query the data efficiently.
-
Clone the repository:
git clone https://github.com/VirajDeshwal/mdc-RAG cd mdc-RAG
-
Install Anaconda:
- Download and install Anaconda from here.
-
Create and activate a new conda environment:
conda create --name rag python=3.10 conda activate rag
-
Install the required dependencies:
pip install -r requirements.txt
-
Configure environment variables:
- Create a
.env
file in the root directory. - Add your OpenAI and Pinecone API keys:
OPENAI_API_KEY=your_openai_api_key PINECONE_API_KEY=your_pinecone_api_key
- Create a
-
Add your PDF files:
- Place the PDF files you want to process in the
input_src
folder.
- Place the PDF files you want to process in the
-
Create the vector database and index the data:
python run.py
-
Run a query against the indexed data:
python pinecone_query.py
run.py
: Script to create the vector index and store the data.pinecone_query.py
: Script to query the Pinecone vector database.requirements.txt
: List of dependencies required for the project.input_src/
: Directory to place the PDF files to be processed.utils/
: Utility functions and modules used in the project..env
: Environment file to store API keys (not included in the repository).
-
Indexing Data:
- Ensure your PDFs are in the
input_src
folder. - Run
python run.py
to create the vector index.
- Ensure your PDFs are in the
-
Querying Data:
- Run
python pinecone_query.py
to perform queries on the indexed data.
- Run
- Make sure to replace
your_openai_api_key
andyour_pinecone_api_key
with your actual API keys in the.env
file.