Skip to content
View cpdu's full-sized avatar
Block or Report

Block or report cpdu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 387 23 Updated Jul 10, 2024

The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.

Python 1,356 97 Updated Jul 5, 2024

Foundational model for human-like, expressive TTS

Python 3,643 637 Updated Jul 30, 2024

Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.

Jupyter Notebook 35,974 3,777 Updated Jul 28, 2024

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…

Python 4,389 373 Updated Aug 4, 2024

An unofficial implementation of "UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding".

Python 21 1 Updated Nov 4, 2023

Pytorch implementation of BigVSAN

Python 195 16 Updated Mar 23, 2024

FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.

Python 335 28 Updated Jan 25, 2024

State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.

Python 1,077 98 Updated Jul 11, 2024

An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io

Python 7,508 749 Updated Feb 11, 2024

PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html

Python 1,971 321 Updated Nov 14, 2023

Implementation of Voicebox, new SOTA Text-to-speech network from MetaAI, in Pytorch

Python 579 47 Updated Feb 16, 2024

A differentiable version of SPTK

Python 158 13 Updated Aug 20, 2024

Foundational Models for State-of-the-Art Speech and Text Translation

Jupyter Notebook 10,674 1,032 Updated Aug 15, 2024
Python 864 282 Updated Aug 17, 2024

Implementation of Meta-Voicebox : The first generative AI model for speech to generalize across tasks with state-of-the-art performance.

548 31 Updated Jun 19, 2023

An ODE-based generative neural vocoder using Rectified Flow

Python 57 6 Updated Apr 29, 2023

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 34,382 4,020 Updated Aug 20, 2024

Making large AI models cheaper, faster and more accessible

Python 38,502 4,320 Updated Aug 20, 2024

Keep track of big models in audio domain, including speech, singing, music etc.

423 24 Updated Jan 17, 2024

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 11,276 2,341 Updated Aug 20, 2024

Official PyTorch implementation of BigVGAN (ICLR 2023)

Python 824 96 Updated Aug 13, 2024

Synthesis of MIDI with DDSP (https://midi-ddsp.github.io/)

Python 298 18 Updated Nov 30, 2022

DDSP: Differentiable Digital Signal Processing

Python 2,839 331 Updated Jun 17, 2024

Speech Parameter Estimation Using Differentiable Speech Synthesizer

Python 44 5 Updated May 9, 2023

Official implementation of the source-filter HiFiGAN vocoder

Python 233 34 Updated Jul 29, 2023

Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit

Python 631 109 Updated Aug 20, 2024

Neural network-based singing voice synthesis library for research

Python 676 80 Updated Oct 9, 2023

This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.

Jupyter Notebook 547 114 Updated Sep 18, 2023
Next