Lists (1)
Sort Name ascending (A-Z)
Stars
MiniCPM3-4B: An edge-side LLM that surpasses GPT-3.5-Turbo.
Implementation of "MOSNet: Deep Learning based Objective Assessment for Voice Conversion"
Implementation of the proposed minGRU in Pytorch
Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice
Real-time Speech-Text Foundation Model Toolkit (wip)
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
Ultra-low bitrate neural audio codec (0.31~1.40 kbps) with a better semantic in the latent space.
Dataset and baseline code for the VocalSound dataset (ICASSP2022).
The Emotional Voices Database: Towards Controlling the Emotional Expressiveness in Voice Generation Systems
An Open-Sourced LLM-empowered Foundation TTS System
INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
State-of-the-Art zero-shot voice conversion & singing voice conversion with in context learning
Official code for Interspeech 2023 paper "Self-supervised Fine-tuning for Improved Content Representations by Speaker-invariant Clustering"
SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling
Official PyTorch code for Deep Audio-Signal Holistic Embeddings
Preprocess and segement audio files from ami-corpus
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
Python implementation of pre-processing for End-to-End speech recognition
This repository aims at providing efficient CNNs for Audio Tagging. We provide AudioSet pre-trained models ready for downstream training and extraction of audio embeddings.
Repository for Quantifying Valence and Arousal in Text with Multilingual Pre-trained Transformers
[ACMMM'2024] Generative Expressive Conversational Speech Synthesis
《代码随想录》LeetCode 刷题攻略:200道经典题目刷题顺序,共60w字的详细图解,视频难点剖析,50余张思维导图,支持C++,Java,Python,Go,JavaScript等多语言版本,从此算法学习不再迷茫!🔥🔥 来看看,你会发现相见恨晚!🚀
Models and code for RepCodec: A Speech Representation Codec for Speech Tokenization
Code for ICML2020 paper - CLUB: A Contrastive Log-ratio Upper Bound of Mutual Information
DEX-TTS: Diffusion-based EXpressive TTS with Style Modeling on Time Variability
Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)