-
Northwestern Polytechnical University
- Suzhou
-
10:46
(UTC +08:00)
Block or Report
Block or report Shengqiang-Li
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseLists (7)
Sort Name ascending (A-Z)
Stars
Language
Sort by: Recently starred
LlamaVoice is a llama-based large voice generation model, providing inference and training ability.
A Python wrapper for the high-quality vocoder "World"
Multi-Task Speech classification of accent and gender of an english speaker on Mozilla's common voice dataset
ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec
TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models (2024 ICASSP)
Pitch Estimating Neural Networks (PENN)
This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.
FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds. AI拟音大师,给你的无声视频添加生动而且同步的音效 😝
Multilingual Voice Understanding Model
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
Low-complexity neural image & video codec.
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on
A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization
Versatile audio super resolution (any -> 48kHz) with AudioSR.
Ultra-low bitrate neural audio codec (0.31~1.40 kbps) with a better semantic in the latent space.
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
FACodec: Speech Codec with Attribute Factorization used for NaturalSpeech 3
Vector (and Scalar) Quantization, in Pytorch
speech self-supervised representations
Simple text to phones converter for multiple languages
Soft speech units for voice conversion