Skip to content
View hbwu-ntu's full-sized avatar

Highlights

  • Pro
Block or Report

Block or report hbwu-ntu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

OpenDiT: An Easy, Fast and Memory-Efficient System for DiT Training and Inference

Python 1,290 77 Updated Jul 6, 2024

FACodec: Speech Codec with Attribute Factorization used for NaturalSpeech 3

Python 127 8 Updated Apr 20, 2024

This repository contains the SpeechBrain Benchmarks

Python 70 31 Updated Jul 5, 2024

A multi-voice TTS system trained with an emphasis on quality

Jupyter Notebook 12,373 1,734 Updated Jun 27, 2024

Localized watermarking for AI-generated speech audios, with SOTA on robustness and very fast detector

Python 356 41 Updated Jun 25, 2024

Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.

Python 1,430 88 Updated Jun 21, 2024

[Interspeech 2024] Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation

Jupyter Notebook 43 1 Updated Jun 30, 2024

Taming Transformers for High-Resolution Image Synthesis

Jupyter Notebook 5,547 1,107 Updated Apr 25, 2024

Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis

Python 695 82 Updated Jul 6, 2024

Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate

Python 242 16 Updated Apr 9, 2024

Ultra-low bitrate neural audio codec (0.31~1.40 kbps) with a better semantic in the latent space.

Python 62 2 Updated Jun 21, 2024

PyTorch implementation of Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities.

Python 85 Updated Jun 26, 2024

Official implementation for our paper "Audio Mamba: Selective State Spaces for Self-Supervised Audio Representations"

Python 13 Updated Jun 6, 2024

Official repository for the paper Singing Voice Graph Modeling for SingFake Detection (Interspeech 2024).

Python 8 1 Updated Jun 14, 2024

Unofficial Pytorch Lightning Implementation of "A New Framework for CNN-Based Speech Enhancement in the Time Domain"

Python 13 6 Updated May 9, 2023

TCNN Temporal convolutional neural network for real-time speech enhancement in the time domain

Python 40 6 Updated Apr 5, 2022

GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型

Python 3,457 251 Updated Jul 6, 2024

A generative speech model for daily dialogue.

Python 27,161 2,959 Updated Jul 5, 2024

Official release of StyleTalk dataset.

51 2 Updated Jul 1, 2024

Training code for FAcodec presented in NaturalSpeech3

Python 113 12 Updated Jun 26, 2024

Speech, Language, Audio, Music Processing with Large Language Model

Python 397 31 Updated Jul 3, 2024
HTML 37 Updated Jun 11, 2024

Instant voice cloning by MyShell.

Python 27,139 2,634 Updated Jul 6, 2024

Official repository for the paper Multimodal Transformer Distillation for Audio-Visual Synchronization (ICASSP 2024).

Python 17 Updated Apr 3, 2024

Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supportin…

Jupyter Notebook 10,381 1,472 Updated Jul 5, 2024

The open source code for LLM-Codec

Python 91 2 Updated Jun 17, 2024

The official Meta Llama 3 GitHub site

Python 22,896 2,411 Updated Jul 3, 2024
Next