This repository contains the code for an advanced interactive assistant powered by OpenAI's newest model, GPT-4o. The assistant leverages multiple inputs, including screenshots and audio, to provide contextual and accurate responses to user queries. It integrates with a document database to ensure responses are based on relevant context from provided documents.
- Screenshot Capture and Encoding: Utilizes PyAutoGUI to capture screenshots and encode them in base64 format for input to the model.
- Audio Detection and Transcription: Detects and records audio using
sounddevice
, processes it withWhisper
, and transcribes it to text. - Contextual Responses: Employs document embeddings and similarity search with LangChain and SingleStoreDB to find relevant context for generating responses.
- Text-to-Speech Output: Converts the model's responses into speech using OpenAI's TTS capabilities.
- OpenAI GPT-4o: For generating responses.
- OpenCV: For image processing.
- Whisper: For audio transcription.
- Sounddevice and Soundfile: For audio handling.
- Playsound: For audio playback.
- PyAutoGUI: For screenshot capture.
- LangChain: For document processing and embedding.
- SingleStoreDB: For document storage and similarity search.
- Capture Screenshot: The assistant captures a screenshot of the current screen.
- Record Audio: It listens for user speech, records the audio, and transcribes it into text.
- Context Retrieval: Searches for relevant context in a document database using similarity search.
- Generate Response: Sends the transcribed text, captured screenshot, and relevant context to the GPT-4o model to generate a response.
- Text-to-Speech: Converts the generated response to speech and plays it back to the user.
- Clone the repository.
- Install the required dependencies:
pip install openai opencv-python-headless sounddevice numpy soundfile speechrecognition whisper playsound pyautogui langchain_community singlestoredb
- Set your OpenAI API key in the environment variable
OPENAI_API_KEY
. - Set your SingleStoreDB URL in the environment variable
SINGLESTOREDB_URL
. - Place your documents (e.g.,
pytorch_docs.txt
) in the same directory. - Run the main script:
python computer_assistant.py or python assistant.py
- Integrate additional sensors and input methods.
- Improve audio quality and handling.
- Extend the assistant's capabilities for different use cases and domains.
Feel free to contribute and enhance this interactive assistant!