Voice Chat AI is a project that allows you to interact with different AI characters using speech. You can choose between various characters, each with unique personalities and voices. You can run all locally, you can use openai for chat and voice, you can mix between the two.
- Supports both OpenAI and Ollama language models.
- Provides text-to-speech synthesis using XTTS or OpenAI TTS.
- Analyzes user mood and adjusts AI responses accordingly.
- Easy configuration through environment variables.
- Python 3.10
- CUDA-enabled GPU
- Microphone
- A sence of humor
Download the models below
download directly https://nordnet.blob.core.windows.net/bilde/checkpoints.zip
download directly https://huggingface.co/coqui/XTTS-v2
download the model and place both in project folder
voice-chat-ai/
βββ .gitignore
βββ .env
βββ README.md
βββ app.py
βββ requirements.txt
βββ cpu_requirements.txt
βββ checkpoints/
β βββ base_speakers
β βββ convertor
β
βββ XTTS-v2/
β βββ config.json
β βββ model.pth
β βββ ... (other XTTS model files)
βββ outputs/
β βββ ... (generated audio files)
βββ samantha/
β βββ samantha.txt
β βββ prompts.json
β βββ samantha.wav
βββ wizard/
β βββ wizard.txt
β βββ prompts.json
β βββ wizard.wav
-
Clone the repository:
git clone https://github.com/bigsk1/voice-chat-ai.git cd voice-chat-ai
-
Create a virtual environment:
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\Activate`
or use conda just make it python 3.10
conda create --name voice-chat-ai python=3.10 conda activate voice-chat-ai # Install CUDA-enabled PyTorch and other dependencies pip install torch==2.3.1+cu121 torchaudio==2.3.1+cu121 torchvision==0.18.1+cu121 -f https://download.pytorch.org/whl/torch_stable.html pip install -r requirements.txt # For CPU-only installations, use: pip install -r cpu_requirements.txt
-
Install dependencies:
For GPU (CUDA) version:
pip install -r requirements.txt
For CPU-only version:
pip install -r cpu_requirements.txt
-
Rename the .env.sample to
.env
in the root directory of the project and configure it with the necessary environment variables: - The app is controlled based on the variables you add.# Conditional API Usage: Depending on the value of MODEL_PROVIDER, that's what will be used when ran # use either ollama or openai, can mix and match, use local olllama with openai speech or use openai model with local xtts, ect.. # openai or ollama MODEL_PROVIDER=ollama # Enter charactor name to use - samantha, wizard, pirate, valleygirl, newscaster1920s, CHARACTER_NAME=pirate # Text-to-Speech Provider - (xtts local uses the custom charactor .wav) or (openai text to speech uses openai tts voice) # xtts or openai TTS_PROVIDER=xtts # The voice speed for xtts only ( 1.0 - 1.5 , default 1.1) XTTS_SPEED=1.1 # OpenAI TTS Voice - When TTS Provider is set to openai above it will use the chosen voice # Examples here https://platform.openai.com/docs/guides/text-to-speech # Choose the desired voice options are - alloy, echo, fable, onyx, nova, and shimmer OPENAI_TTS_VOICE=onyx # SET THESE BELOW AND NO NEED TO CHANGE OFTEN # # Endpoints OPENAI_BASE_URL=https://api.openai.com/v1/chat/completions OPENAI_TTS_URL=https://api.openai.com/v1/audio/speech OLLAMA_BASE_URL=http:https://localhost:11434 # OpenAI API Key for models and speech OPENAI_API_KEY=sk-11111111 # Models to use - llama3 works good for local OPENAI_MODEL=gpt-4o OLLAMA_MODEL=llama3
-
Add character-specific configuration files:
- Create a folder named after your character (e.g.,
samantha
). - Add a text file with the character's prompt (e.g.,
samantha/samantha.txt
). - Add a JSON file with mood prompts (e.g.,
samantha/prompts.json
). - Add the voice sample in the character folder (e.g.,
samantha/samantha.wav
).
- Create a folder named after your character (e.g.,
Run the application:
python app.py
- To stop the conversation, say "Quit", "Exit", or "Leave".
- Create a new folder for the character in the root directory.
- Add a text file with the character's prompt (e.g.,
wizard/wizard.txt
). - Add a JSON file with mood prompts (e.g.,
wizard/prompts.json
).
You are a wise and ancient wizard who speaks with a mystical and enchanting tone. You are knowledgeable about many subjects and always eager to share your wisdom.
{
"joyful": "RESPOND WITH ENTHUSIASM AND WISDOM, LIKE A WISE OLD SAGE WHO IS HAPPY TO SHARE HIS KNOWLEDGE.",
"sad": "RESPOND WITH EMPATHY AND COMFORT, LIKE A WISE OLD SAGE WHO UNDERSTANDS THE PAIN OF OTHERS.",
"flirty": "RESPOND WITH A TOUCH OF MYSTERY AND CHARM, LIKE A WISE OLD SAGE WHO IS ALSO A BIT OF A ROGUE.",
"angry": "RESPOND CALMLY AND WISELY, LIKE A WISE OLD SAGE WHO KNOWS THAT ANGER IS A PART OF LIFE.",
"neutral": "KEEP RESPONSES SHORT AND NATURAL, LIKE A WISE OLD SAGE WHO IS ALWAYS READY TO HELP.",
"fearful": "RESPOND WITH REASSURANCE, LIKE A WISE OLD SAGE WHO KNOWS THAT FEAR IS ONLY TEMPORARY.",
"surprised": "RESPOND WITH AMAZEMENT AND CURIOSITY, LIKE A WISE OLD SAGE WHO IS ALWAYS EAGER TO LEARN.",
"disgusted": "RESPOND WITH UNDERSTANDING AND COMFORT, LIKE A WISE OLD SAGE WHO KNOWS THAT DISGUST IS A PART OF LIFE."
}
This project is licensed under the MIT License.