Starred repositories
🚀 一键部署(含离线整合包)!基于 ChatTTS ,支持流式输出、音色抽卡、长音频生成和分角色朗读。简单易用,无需复杂安装。
官方推荐的 ChatTTS 资源汇总项目,整理了全网相关资源和常见问题 || Officially recommended ChatTTS resource collection project
Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with …
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
BELLE: Be Everyone's Large Language model Engine(开源中文对话大模型)
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
Multilingual Voice Understanding Model
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
WhisperFusion builds upon the capabilities of WhisperLive and WhisperSpeech to provide a seamless conversations with an AI.
Whisper realtime streaming for long speech-to-text transcription and translation
A nearly-live implementation of OpenAI's Whisper.
Faster Whisper transcription with CTranslate2
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
LSLM implements full duplex modeling in interactive speech language models, based on research by Ma et al. (2024). This project advances human-computer interaction through real-time spoken dialogue…
Speech To Speech: an effort for an open-sourced and modular GPT4-o
API and websocket server for sensevoice. It has inherited some enhanced features, such as VAD detection, real-time streaming recognition, and speaker verification.
SpeechGPT Series: Speech Large Language Models
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
Build real-time multimodal AI applications 🤖🎙️📹
Open Source framework for voice and multimodal conversational AI
Example UI implementing the RTVI web client
A generative speech model for daily dialogue.
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Android application for running Windows applications with Wine and Box86/Box64
Digital Avatar Conversational System - Linly-Talker. 😄✨ Linly-Talker is an intelligent AI system that combines large language models (LLMs) with visual models to create a novel human-AI interaction…
The first fully developed Java webtransport(webTransport) server