- GuangZhou
- lovemefan.top
Lists (22)
Sort Name ascending (A-Z)
AI Other
ASR
avatar
dataset
dataset for aiDiffusion
✨ Inspiration
kws
Language Model
llm
lager language modelMindspore
Music
python code style
python代码规范quantization
model quantizationRUST
Singing Voice Synthesis
Speech Editing
SpeechEnhance
speechllm
super resolution
TTS
工具
微服务
Stars
Code for vec2wav 2.0, a speech token vocoder for VC. Paper: https://arxiv.org/abs/2409.01995
Awesome music generation model——MG²
first base model for full-duplex conversational audio
Fast and accurate automatic speech recognition (ASR) for edge devices
The fastest digital human algorithm, now on your desktop.
PyTorch implementation of the Differential-Transformer architecture for sequence modeling, specifically tailored as a decoder-only model similar to large language models (LLMs). The architecture in…
Human Motion Video Generation: A Survey (https://www.techrxiv.org/users/836049/articles/1228135-human-motion-video-generation-a-survey)
Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。
Pseudo Streaming SenseVoice with Hotwords
Hallo2: Long-Duration and High-Resolution Audio-driven Portrait Image Animation
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
[NeurIPS 2024] Official code for PuLID: Pure and Lightning ID Customization via Contrastive Alignment
An Open-Sourced LLM-empowered Foundation TTS System
开源的SSL证书管理工具,可以帮助你自动申请、部署SSL证书,并在证书即将过期时自动续期。An open-source SSL certificate management tool that helps you automatically apply for and deploy SSL certificates, as well as automatically renew them w…
[INTERSPEECH'24] Official repository for "MultiTalk: Enhancing 3D Talking Head Generation Across Languages with Multilingual Video Dataset"
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice