Lists (9)
Sort Name ascending (A-Z)
Stars
ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec
PyTorch implementation of "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language" from Meta AI
A lightweight library for Frechet Audio Distance calculation.
Inference and training library for high-quality TTS models.
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling
An attention-based backend allowing efficient fine-tuning of transformer models for speaker verification
Lamomal / s3prl_correlation
Forked from s3prl/s3prlSelf-Supervised Speech Pre-training and Representation Learning Toolkit.
SALMONN: Speech Audio Language Music Open Neural Network
dataset for lightly supervised training using the librivox audio book recordings. https://librivox.org/.
Helw150 / levanter
Forked from stanford-crfm/levanterLegible, Scalable, Reproducible Foundation Models with Named Tensors and Jax
The official pytorch implemention of the Intespeech 2024 paper "Reshape Dimensions Network for Speaker Recognition"
PyTorch implementation of Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities.
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
MARS5 speech model (TTS) from CAMB.AI
Code for Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction (ACL24))
Multilingual and Controllable Text-to-Speech Toolkit of the Speech and Language Technologies Group at the University of Stuttgart.
Official PyTorch implementation of AdaFlow
BLSP-Emo: Towards Empathetic Large Speech-Language Models
Implementation of TTS model based on NVIDIA P-Flow TTS Paper
A summary of related works about flow matching, stochastic interpolants
This repository aims at providing efficient CNNs for Audio Tagging. We provide AudioSet pre-trained models ready for downstream training and extraction of audio embeddings.
[ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching
Official repository of DailyTalk: Spoken Dialogue Dataset for Conversational Text-to-Speech, ICASSP 2023
[AAAI 2024] Code for CTX-vec2wav in UniCATS