Stars
The Microsoft Scalable Noisy Speech Dataset (MS-SNSD) is a noisy speech dataset that can scale to arbitrary sizes depending on the number of speakers, noise types, and Speech to Noise Ratio (SNR) l…
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
[ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching
Command line utility for forced alignment using Kaldi
Audio Codec Benchmark
Fake speech detection with the CodecFake dataset
SincNet is a neural architecture for efficiently processing raw audio samples.
A list of tools, papers and code related to Fake Audio Detection.
This project uses a variety of advanced voiceprint recognition models such as EcapaTdnn, ResNetSE, ERes2Net, CAM++, etc. It is not excluded that more models will be supported in the future. At the …
Implementation of the paper "Improved DeepFake Detection Using Whisper Features"
This repository includes the code to reproduce our paper "End-to-End Spectro-Temporal Graph Attention Networks for Speaker Verification Anti-Spoofing and Speech Deepfake Detection" (https://arxiv.o…
Research progress on speech deepfake detection: Relevant datasets aggregated from the review literature and publicly available codes
AI-S2-Lab / M2S-ADD
Forked from ttslr/M2S-ADD[InterSpeech'2023] "Betray Oneself: A Novel Audio DeepFake Detection Model via Mono-to-Stereo Conversion"
pytorch structural similarity (SSIM) loss
Vector Quantized VAEs - PyTorch Implementation
Vector (and Scalar) Quantization, in Pytorch
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
singing voice change based on whisper, and lora for singing voice clone
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Implementation of SoundStorm, Efficient Parallel Audio Generation from Google Deepmind, in Pytorch
KAN-TTS is a speech-synthesis training framework, please try the demos we have posted at https://modelscope.cn/models?page=1&tasks=text-to-speech
Speech Recognition using DeepSpeech2.