Skip to content
View ruimina's full-sized avatar
  • Tsinghua University
  • China Beijing

Block or report ruimina

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 267 16 Updated Sep 12, 2024

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 2,136 217 Updated Sep 9, 2024

Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate

Python 349 21 Updated Sep 11, 2024

SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling

Python 591 32 Updated Sep 9, 2024
Python 188 12 Updated Jul 17, 2024

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Rust 8,879 764 Updated Sep 3, 2024

The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.

Python 1,040 55 Updated Aug 13, 2024

Efficiently Fine-Tune 100+ LLMs in WebUI (ACL 2024)

Python 30,635 3,775 Updated Sep 11, 2024

Go ahead and axolotl questions

Python 7,516 813 Updated Sep 11, 2024

Qwen2 is the large language model series developed by Qwen team, Alibaba Cloud.

Shell 7,391 447 Updated Sep 12, 2024

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Python 13,264 1,075 Updated Sep 2, 2024

SpeechGPT Series: Speech Large Language Models

Python 1,215 80 Updated Jul 22, 2024

Provides training, inference and voice conversion recipes for RADTTS and RADTTS++: Flow-based TTS models with Robust Alignment Learning, Diverse Synthesis, and Generative Modeling and Fine-Grained …

Roff 280 40 Updated Apr 6, 2023

Source code of APNet2, a vocoder

Python 49 11 Updated Nov 23, 2023

[InterSpeech 24] FreeV: Free Lunch For Vocoders Through Pseudo Inversed Mel Filter

Python 70 6 Updated Jul 4, 2024

How to use our public wav2vec2 dimensional emotion model

Jupyter Notebook 433 47 Updated May 22, 2023

AcademiCodec: An Open Source Audio Codec Model for Academic Research

Python 560 79 Updated Dec 27, 2023

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 4,670 471 Updated Sep 6, 2024

Vector (and Scalar) Quantization, in Pytorch

Python 2,384 196 Updated Sep 4, 2024

A generative speech model for daily dialogue.

Python 30,536 3,319 Updated Sep 4, 2024

Audio Codec Speech processing Universal PERformance Benchmark

Python 201 22 Updated Sep 11, 2024

High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.

Python 4,376 549 Updated Aug 9, 2024

Instant voice cloning by MIT and MyShell.

Python 28,338 2,773 Updated Aug 21, 2024

Object-oriented handling of audio data, with GPU-powered augmentations, and more.

Python 216 37 Updated Jul 22, 2024

State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.

Python 1,119 101 Updated Jul 11, 2024

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 34,678 4,045 Updated Sep 12, 2024

🙌 OpenHands: Code Less, Make More

Python 31,228 3,601 Updated Sep 12, 2024

[ICASSP 2024] This is the official code for "VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching"

Python 287 20 Updated Sep 3, 2024
Next