This code implements a chat-app with text similarity search for querying a document. Think of it as an upgraded Cmd+F search. It's written in Pure Python. Created for Learning Purposes.
This code uses the following libraries:
streamlit
: for building the user interface.openai
: for generating responses to user questions.tiktoken
: for tokenizing textscikit-learn
: for finding the relevant text chunks based on a user's question.numpy
: for creating arrayspandas
: for creating dataframes
To run this code, you need an OpenAI API Key. You can get an OpenAI API key by creating an account on the OpenAI website. Copy it to your clipboard and paste it into the app once its running. All the dependencies are handled automatically from the requirements.txt file
Run the following command:
pip install --upgrade streamlit
streamlit run https://github.com/BrianLesko/text-similarity-search/blob/main/app.py
This will start the Streamlit server, and you can access the chatbot by opening a web browser and navigating to https://localhost:8501
.
The chatbot works as follows:
- The user enters a question in the input field.
- The chatbot retrieves relevant text chunks based on the user's question using scikit-learn cosine similarity search.
- The chatbot adds the user's question to the retrieved text chunks to create an augmented query.
- The chatbot generates a response to the augmented query using OpenAI's GPT-3.5 (Chat GPT) language model.
- The chatbot displays the response to the user, along with the chat history.
The chat history is saved in the st.session_state
dictionary, which is a dictionary that persists across Streamlit sessions.
doc-chat/
├── .streamlit/
│ └── config.toml # theme info for the UI
├── docs/
│ └── content.png
├── app.py # the code and UI integrated together live here
├── about.py # for the UI
├── requirements.txt # the python packages needed to run locally
└── .gitignore # includes the api key file and the local virtual environment
Python | Streamlit | Git | Low Code UI
Template Repository | Chat interface | LLM
Text similarity | Text embeddings | Cosine Similarity
Sklearn | OpenAI
╭━━╮╭━━━┳━━┳━━━┳━╮╱╭╮ ╭╮╱╱╭━━━┳━━━┳╮╭━┳━━━╮ ┃╭╮┃┃╭━╮┣┫┣┫╭━╮┃┃╰╮┃┃ ┃┃╱╱┃╭━━┫╭━╮┃┃┃╭┫╭━╮┃ ┃╰╯╰┫╰━╯┃┃┃┃┃╱┃┃╭╮╰╯┃ ┃┃╱╱┃╰━━┫╰━━┫╰╯╯┃┃╱┃┃ ┃╭━╮┃╭╮╭╯┃┃┃╰━╯┃┃╰╮┃┃ ┃┃╱╭┫╭━━┻━━╮┃╭╮┃┃┃╱┃┃ ┃╰━╯┃┃┃╰┳┫┣┫╭━╮┃┃╱┃┃┃ ┃╰━╯┃╰━━┫╰━╯┃┃┃╰┫╰━╯┃ ╰━━━┻╯╰━┻━━┻╯╱╰┻╯╱╰━╯ ╰━━━┻━━━┻━━━┻╯╰━┻━━━╯
follow all of these or i will kick you