-
NCSOFT AI
- Seongnam, Republic of Korea
- https://www.linkedin.com/in/minsu-kang-54a43b212/
Stars
Code for vec2wav 2.0, a speech token vocoder for VC. Paper: https://arxiv.org/abs/2409.01995
Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)
State-of-the-Art zero-shot voice conversion & singing voice conversion with in context learning
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2
A collection of neural vocoders suitable for singing voice synthesis tasks.
ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec
A summary of related works about flow matching, stochastic interpolants
Official Demo Page for DiTTo-TTS: Efficient and Scalable Zero-Shot Text-to-Speech with Diffusion Transformer
Codebase for benchmarking several open-sourced SpeechLLM models
LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning
The official implementation of our paper "Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning".
PyTorch implementation of normalizing flow models
High-Fidelity Neural Phonetic Posteriorgrams
An unofficial PyTorch implementation of the StreamVC(Real-Time Low-Latency Voice Conversion)
A comprehensive collection of KAN(Kolmogorov-Arnold Network)-related resources, including libraries, projects, tutorials, papers, and more, for researchers and developers in the Kolmogorov-Arnold N…
An efficient pure-PyTorch implementation of Kolmogorov-Arnold Network (KAN).
Implementation of a single layer of the MMDiT, proposed in Stable Diffusion 3, in Pytorch
TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models (2024 ICASSP)
The original sources of MS-DOS 1.25, 2.0, and 4.0 for reference purposes
The Emotional Voices Database: Towards Controlling the Emotional Expressiveness in Voice Generation Systems
[ICML2024 (Oral)] Official PyTorch implementation of DoRA: Weight-Decomposed Low-Rank Adaptation