Skip to content
View Shengqiang-Li's full-sized avatar
💭
I may be slow to respond.
💭
I may be slow to respond.
  • Northwestern Polytechnical University
  • Suzhou
  • 10:46 (UTC +08:00)
Block or Report

Block or report Shengqiang-Li

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Beta Lists are currently in beta. Share feedback and report bugs.
Showing results

LlamaVoice is a llama-based large voice generation model, providing inference and training ability.

Python 90 4 Updated Aug 1, 2024

A Python wrapper for the high-quality vocoder "World"

Cython 717 117 Updated Oct 23, 2023

Multi-Task Speech classification of accent and gender of an english speaker on Mozilla's common voice dataset

Python 22 1 Updated Jul 25, 2024

ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec

Python 158 4 Updated Aug 2, 2024

TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models (2024 ICASSP)

Python 110 4 Updated Jun 3, 2024

Pitch Estimating Neural Networks (PENN)

Python 215 17 Updated Jul 31, 2024

This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.

Jupyter Notebook 546 112 Updated Sep 18, 2023

FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds. AI拟音大师,给你的无声视频添加生动而且同步的音效 😝

Python 330 21 Updated Jul 26, 2024

Evaluation metrics for TTS model.

Python 2 Updated Jul 8, 2024

Multilingual Voice Understanding Model

Python 1,842 175 Updated Aug 2, 2024

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 3,258 302 Updated Aug 1, 2024

Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis

Python 723 85 Updated Jul 6, 2024

Low-complexity neural image & video codec.

Python 92 5 Updated Jul 19, 2024

This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on

Python 384 34 Updated Jun 9, 2024

A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization

Python 982 83 Updated Aug 1, 2024

Keras implement of Finite Scalar Quantization

Python 56 4 Updated Oct 31, 2023

Versatile audio super resolution (any -> 48kHz) with AudioSR.

Python 1,021 104 Updated May 10, 2024

Ultra-low bitrate neural audio codec (0.31~1.40 kbps) with a better semantic in the latent space.

Python 93 3 Updated Jul 15, 2024

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 11,149 2,323 Updated Aug 3, 2024

g2p: English Grapheme To Phoneme Conversion

Python 777 125 Updated Jan 5, 2023
Python 1,719 92 Updated Jul 31, 2024

FACodec: Speech Codec with Attribute Factorization used for NaturalSpeech 3

Python 134 8 Updated Apr 20, 2024

Easy-to-Use Speech MOS predictors

Python 189 12 Updated Oct 24, 2023

Perceptual Quality Estimator for speech and audio

C++ 656 118 Updated Aug 2, 2024
Python 39 2 Updated Dec 19, 2023

Vector (and Scalar) Quantization, in Pytorch

Python 2,257 183 Updated Jul 28, 2024

speech self-supervised representations

Python 439 34 Updated Apr 27, 2023

Simple text to phones converter for multiple languages

Python 1,164 163 Updated Aug 1, 2024

Soft speech units for voice conversion

Jupyter Notebook 390 32 Updated Mar 14, 2024
Next