-
National Institute of Informatics
- Tokyo
-
22:51
(UTC +09:00) - https://zengchang233.github.io/
- https://scholar.google.com/citations?user=gfGyn49j-MkC&hl=en
- in/chang-zeng-5451a4191
- https://nii-yamagishilab.github.io/
Starred repositories
a MUSHRA compliant web audio API based experiment software
A toolkit for non-parallel voice conversion based on vector-quantized variational autoencoder
State-of-the-Art zero-shot voice conversion & singing voice conversion with in context learning
Unsupervised Rhythm Modeling for Voice Conversion
A sequence-to-sequence voice conversion toolkit.
A toolkit for any-to-any encoder-decoder voice conversion systems
S3PRL-VC: A Voice Conversion Toolkit based on S3PRL
Official implementation of VQMIVC: One-shot (any-to-any) Voice Conversion @ Interspeech 2021 + Online playing demo!
Speech Representation Disentanglement with Adversarial Mutual Information Learning for One-shot Voice Conversion (Interspeech 2022)
Code for ICML2020 paper - CLUB: A Contrastive Log-ratio Upper Bound of Mutual Information
Lightweight Speech Representation Learning for One-Shot Voice Conversion
[WIP] VoiceSmith makes training text to speech models easy.
A curated list of Large Language Model resources, covering model training, serving, fine-tuning, and building LLM applications.
A curated list of awesome voice conversion, projects and communities.
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
The official Python library for the OpenAI API
✨✨Latest Advances on Multimodal Large Language Models
Audio Codec Speech processing Universal PERformance Benchmark
A minimal yet resourceful implementation of diffusion models (along with pretrained models + synthetic images for nine datasets)
Official code for Interspeech 2023 paper "Self-supervised Fine-tuning for Improved Content Representations by Speaker-invariant Clustering"
Reading notes about Multimodal Large Language Models, Large Language Models, and Diffusion Models
Adversarial Training of Denoising Diffusion Model Using Dual Discriminators for High-Fidelity Multi-Speaker TTS
PyTorch Implementation of DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs
PyTorch Implementation of StyleSinger(AAAI 2024): Style Transfer for Out-of-Domain Singing Voice Synthesis
Findings of ACL 2023 | AlignSTS: a speech-to-singing (STS) model based on modality disentanglement and cross-modal alignment