-
University of Texas at Dallas
- https://mu-y.github.io/
- @MuYang55
Block or Report
Block or report Mu-Y
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseStars
Language
Sort by: Recently starred
Zero-Shot Speech Editing and Text-to-Speech in the Wild
Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)
Official Implementation of EnCLAP (ICASSP 2024)
Speech, Language, Audio, Music Processing with Large Language Model
Daily tracking of awesome audio papers, including music generation, zero-shot tts, asr, audio generation
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
Metrics for evaluating music and audio generative models – with a focus on long-form, full-band, and stereo generations.
This is the official repository of the papers "Parameter-Efficient Transfer Learning of Audio Spectrogram Transformers" and "Efficient Fine-tuning of Audio Spectrogram Transformers via Soft Mixture…
TorchCFM: a Conditional Flow Matching library
Inference and training library for high-quality TTS models.
Awesome speech/audio LLMs, representation learning, and codec models
music generation with masked transformers!
Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on
A family of open-sourced Mixture-of-Experts (MoE) Large Language Models
Audio Codec Speech processing Universal PERformance Benchmark
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.
FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
📄 Awesome CV is LaTeX template for your outstanding job application
Real-time end-to-end singing voice conversion system based on DDSP (Differentiable Digital Signal Processing)