voice assistant

Overview

This project implements a real-time audio processing server using WebSockets. It captures audio from a client, processes it to detect speech, transcribes the audio, generates a response using a language model, and converts the response to speech. The server is built using Python and leverages several libraries for audio processing, machine learning, and WebSocket communication.

Technologies Used

Python: The primary programming language for the server.
WebSockets: For real-time communication between the server and clients.
aiortc: A library for WebRTC and real-time communication.
webrtcvad: Voice Activity Detection (VAD) to determine if audio contains speech.
numpy: For numerical operations on audio data.
soundfile: For reading and writing audio files.
gTTS (Google Text-to-Speech): For converting text responses to speech.
Groq: For audio transcription and generating responses using a language model.
pyaudio: For capturing audio input from the microphone.
playsound: For playing audio files.
asyncio: For asynchronous programming in Python.

Features

Real-time Audio Processing: Captures audio from clients and processes it in real-time.
Speech Detection: Uses VAD to detect speech in the audio stream.
Transcription: Converts audio to text using a transcription service.
Response Generation: Generates a response based on the transcribed text using Llama 3 (LLM).
Text-to-Speech: Converts the generated response back to audio for playback.

Installation

Clone the repository:

git clone https://github.com/adc77/voice-assistant-.git
cd voice-assistant-

Install the required packages:
```
pip install -r requirements.txt
```
Set up environment variables:
- Ensure you have the GROQ_API_KEY set in your environment for accessing the transcription and response generation services.

Usage

Start the WebSocket server:
```
python server.py
```
Connect a client to the server and start sending audio data.
The server will process the audio, transcribe it, generate a response, and send back the audio response.

File Structure

server.py: The main server file that handles WebSocket connections and audio processing.
pipeline.py: Contains functions for audio recording, transcription, response generation, and text-to-speech conversion.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
__pycache__		__pycache__
output		output
recordings		recordings
responses		responses
testing		testing
transcriptions		transcriptions
.gitignore		.gitignore
README.md		README.md
client.html		client.html
client.js		client.js
original_pipeline.py		original_pipeline.py
pipeline.py		pipeline.py
requirements.txt		requirements.txt
server.py		server.py
streamlit_assistant.py		streamlit_assistant.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

voice assistant

Overview

Technologies Used

Features

Installation

Usage

File Structure

About

Releases

Packages

Languages

adc77/voice-assistant-

Folders and files

Latest commit

History

Repository files navigation

voice assistant

Overview

Technologies Used

Features

Installation

Usage

File Structure

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages