Skip to content
View Mu-Y's full-sized avatar

Highlights

  • Pro

Block or report Mu-Y

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Sylber: Syllabic Embedding Representation of Speech from Raw Audio

17 Updated Oct 12, 2024

GLM-4-Voice | 端到端中英语音对话模型

Python 2,123 170 Updated Nov 7, 2024

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

Python 6,732 785 Updated Nov 7, 2024
HTML 28 1 Updated Nov 4, 2024

Official Code for SyllableLM: Learning Coarse Semantic Units for Speech Language Models

Python 35 1 Updated Oct 10, 2024

SpeechGPT Series: Speech Large Language Models

Python 1,283 85 Updated Jul 22, 2024
Python 6,670 506 Updated Oct 31, 2024

pix2tex: Using a ViT to convert images of equations into LaTeX code.

Python 12,604 1,029 Updated Jul 5, 2024
Python 47 2 Updated Nov 7, 2024

SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling

Python 764 42 Updated Oct 23, 2024

The official GitHub page for the survey paper "Foundation Models for Music: A Survey".

91 3 Updated Sep 4, 2024

Zero-Shot Speech Editing and Text-to-Speech in the Wild

Jupyter Notebook 7,626 746 Updated Jun 24, 2024

Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)

Python 136 11 Updated Sep 14, 2023

Official Implementation of EnCLAP (ICASSP 2024)

Python 89 5 Updated Jun 2, 2024

Speech, Language, Audio, Music Processing with Large Language Model

Python 570 52 Updated Nov 6, 2024

ESLTTS dataset

16 1 Updated Jun 21, 2024

Daily tracking of awesome audio papers, including music generation, zero-shot tts, asr, audio generation

347 13 Updated Nov 7, 2024

Instant voice cloning by MIT and MyShell.

Python 29,681 2,915 Updated Aug 21, 2024

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…

Jupyter Notebook 7,385 547 Updated Nov 1, 2024

Metrics for evaluating music and audio generative models – with a focus on long-form, full-band, and stereo generations.

Python 150 16 Updated Jul 25, 2024

This is the official repository of the papers "Parameter-Efficient Transfer Learning of Audio Spectrogram Transformers" and "Efficient Fine-tuning of Audio Spectrogram Transformers via Soft Mixture…

Python 35 3 Updated Jul 31, 2024

TorchCFM: a Conditional Flow Matching library

Python 1,196 97 Updated Oct 9, 2024

Inference and training library for high-quality TTS models.

Python 4,585 463 Updated Oct 30, 2024

Awesome speech/audio LLMs, representation learning, and codec models

678 33 Updated Nov 7, 2024

UP-TO-DATE LLM Watermark paper. 🔥🔥🔥

285 18 Updated Jun 14, 2024

🎛 🔊 A Python library for audio.

C++ 5,223 261 Updated Nov 7, 2024

music generation with masked transformers!

Python 296 35 Updated Oct 8, 2024

Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate

Python 432 26 Updated Oct 22, 2024

Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis

Python 820 95 Updated Aug 7, 2024

This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on

Python 470 40 Updated Jun 9, 2024
Next