-
University of Texas at Dallas
- https://mu-y.github.io/
- @MuYang55
Highlights
- Pro
Stars
Sylber: Syllabic Embedding Representation of Speech from Raw Audio
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
Official Code for SyllableLM: Learning Coarse Semantic Units for Speech Language Models
SpeechGPT Series: Speech Large Language Models
pix2tex: Using a ViT to convert images of equations into LaTeX code.
SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling
The official GitHub page for the survey paper "Foundation Models for Music: A Survey".
Zero-Shot Speech Editing and Text-to-Speech in the Wild
Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)
Official Implementation of EnCLAP (ICASSP 2024)
Speech, Language, Audio, Music Processing with Large Language Model
Daily tracking of awesome audio papers, including music generation, zero-shot tts, asr, audio generation
Instant voice cloning by MIT and MyShell.
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
Metrics for evaluating music and audio generative models – with a focus on long-form, full-band, and stereo generations.
This is the official repository of the papers "Parameter-Efficient Transfer Learning of Audio Spectrogram Transformers" and "Efficient Fine-tuning of Audio Spectrogram Transformers via Soft Mixture…
TorchCFM: a Conditional Flow Matching library
Inference and training library for high-quality TTS models.
Awesome speech/audio LLMs, representation learning, and codec models
music generation with masked transformers!
Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on