ruimina

Follow

Ruimin Wang ruimina

Follow

3 followers · 2 following

Tsinghua University
China Beijing

Achievements

Achievements

Stars

ictnlp / LLaMA-Omni

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 267 16 Updated Sep 12, 2024

gpt-omni / mini-omni

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 2,136 217 Updated Sep 9, 2024

hubertsiuzdak / snac

Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate

Python 349 21 Updated Sep 11, 2024

jishengpeng / WavTokenizer

SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling

Python 591 32 Updated Sep 9, 2024

Ledzy / BAdam

Python 188 12 Updated Jul 17, 2024

huggingface / tokenizers

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Rust 8,879 764 Updated Sep 3, 2024

QwenLM / Qwen2-Audio

The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.

Python 1,040 55 Updated Aug 13, 2024

hiyouga / LLaMA-Factory

Efficiently Fine-Tune 100+ LLMs in WebUI (ACL 2024)

Python 30,635 3,775 Updated Sep 11, 2024

axolotl-ai-cloud / axolotl

Go ahead and axolotl questions

Python 7,516 813 Updated Sep 11, 2024

QwenLM / Qwen2

Qwen2 is the large language model series developed by Qwen team, Alibaba Cloud.

Shell 7,391 447 Updated Sep 12, 2024

QwenLM / Qwen

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Python 13,264 1,075 Updated Sep 2, 2024

nii-yamagishilab / mos-finetune-ssl

Python 71 18 Updated Jun 14, 2023

0nutation / SpeechGPT

SpeechGPT Series: Speech Large Language Models

Python 1,215 80 Updated Jul 22, 2024

NVIDIA / radtts

Provides training, inference and voice conversion recipes for RADTTS and RADTTS++: Flow-based TTS models with Robust Alignment Learning, Diverse Synthesis, and Generative Modeling and Fine-Grained …

Roff 280 40 Updated Apr 6, 2023

redmist328 / APNet2

Source code of APNet2, a vocoder

Python 49 11 Updated Nov 23, 2023

BakerBunker / FreeV

[InterSpeech 24] FreeV: Free Lunch For Vocoders Through Pseudo Inversed Mel Filter

Python 70 6 Updated Jul 4, 2024

audeering / w2v2-how-to

How to use our public wav2vec2 dimensional emotion model

Jupyter Notebook 433 47 Updated May 22, 2023

jrgillick / laughter-detection

Python 205 47 Updated Jul 25, 2024

yangdongchao / AcademiCodec

AcademiCodec: An Open Source Audio Codec Model for Academic Research

Python 560 79 Updated Dec 27, 2023

FunAudioLLM / CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 4,670 471 Updated Sep 6, 2024

lucidrains / vector-quantize-pytorch

Vector (and Scalar) Quantization, in Pytorch

Python 2,384 196 Updated Sep 4, 2024

2noise / ChatTTS

A generative speech model for daily dialogue.

Python 30,536 3,319 Updated Sep 4, 2024

voidful / Codec-SUPERB

Audio Codec Speech processing Universal PERformance Benchmark

Python 201 22 Updated Sep 11, 2024

myshell-ai / MeloTTS

High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.

Python 4,376 549 Updated Aug 9, 2024

myshell-ai / OpenVoice

Instant voice cloning by MIT and MyShell.

Python 28,338 2,773 Updated Aug 21, 2024

descriptinc / audiotools

Object-oriented handling of audio data, with GPU-powered augmentations, and more.

Python 216 37 Updated Jul 22, 2024

descriptinc / descript-audio-codec

State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.

Python 1,119 101 Updated Jul 11, 2024

microsoft / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 34,678 4,045 Updated Sep 12, 2024

All-Hands-AI / OpenHands

🙌 OpenHands: Code Less, Make More

Python 31,228 3,601 Updated Sep 12, 2024

X-LANCE / VoiceFlow-TTS

[ICASSP 2024] This is the official code for "VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching"

Python 287 20 Updated Sep 3, 2024