Skip to content
View ABC0408's full-sized avatar
🐢
Focusing
🐢
Focusing

Block or report ABC0408

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results
Python 21 2 Updated Aug 26, 2024
Python 39 6 Updated Sep 3, 2024

Speech To Speech: an effort for an open-sourced and modular GPT4-o

Python 2,924 311 Updated Sep 5, 2024

The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.

Python 1,013 53 Updated Aug 13, 2024

VITS with phoneme-level prosody modeling based on MaskGIT

Python 71 7 Updated Aug 31, 2024

Whisper with Medusa heads

Python 762 47 Updated Aug 4, 2024

LlamaVoice is a llama-based large voice generation model, providing inference and training ability.

Python 153 9 Updated Aug 26, 2024

多个SVC/TTS的C++推理库

C 980 118 Updated Aug 10, 2024

A project that optimizes Whisper for low latency inference using NVIDIA TensorRT

Python 44 8 Updated Jul 3, 2024
Jupyter Notebook 45 3 Updated Jul 16, 2024

Localized watermarking for AI-generated speech audios, with SOTA on robustness and very fast detector

Python 403 47 Updated Aug 28, 2024

Alignment examples for Interspeech 2024

10 Updated Jul 5, 2024

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 4,576 461 Updated Sep 6, 2024

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

Python 3,928 389 Updated Aug 22, 2024

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Jupyter Notebook 5,863 751 Updated Aug 19, 2024

This repository contains audio samples and supplementary materials accompanying publications by the "Speaker, Voice and Language" team at Google.

Python 342 40 Updated Sep 5, 2024

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper

Jupyter Notebook 3,283 272 Updated Sep 5, 2024

Toolkit to segment text into sentences or other semantic units in a robust, efficient and adaptable way.

Python 674 39 Updated Sep 2, 2024

MARS5 speech model (TTS) from CAMB.AI

Jupyter Notebook 2,423 195 Updated Aug 1, 2024

JAX implementation of OpenAI's Whisper model for up to 70x speed-up on TPU.

Jupyter Notebook 4,341 366 Updated Apr 3, 2024

This is an implementation for train hifigan part of XTTSv2 model using Coqui/TTS.

Python 53 18 Updated Aug 9, 2024

A generative speech model for daily dialogue.

Python 30,428 3,305 Updated Sep 4, 2024

Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

Python 61 3 Updated Sep 2, 2024

LIGHTVOC AN UPSAMPLING-FREE GAN VOCODER BASED ON CONFORMER AND INVERSE SHORT-TIME FOURIER TRANSFORM

Jupyter Notebook 16 2 Updated May 17, 2024

Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)

Python 204 18 Updated Sep 8, 2024

FlashSpeech: Efficient Zero-Shot Speech Synthesis

61 1 Updated Jul 30, 2024

High fidelity, lightweight, end-to-end, streaming, convolution-based neural audio codec

Jupyter Notebook 62 6 Updated May 23, 2024

Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.

Python 3,222 336 Updated Aug 22, 2024

Transcription, forced alignment, and audio indexing with OpenAI's Whisper

Python 1,473 167 Updated Aug 28, 2024

A fast, local neural text to speech system

C++ 5,714 408 Updated Aug 7, 2024
Next