-
National Taiwan University
- Seattle, WA, US
- https://hbwu-ntu.github.io/
- in/haibin-wu-479a39252
- https://scholar.google.com/citations?user=-bB-WHEAAAAJ&hl=zh-TW
Highlights
- Pro
Block or Report
Block or report hbwu-ntu
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseStars
Language
Sort by: Recently starred
OpenDiT: An Easy, Fast and Memory-Efficient System for DiT Training and Inference
FACodec: Speech Codec with Attribute Factorization used for NaturalSpeech 3
This repository contains the SpeechBrain Benchmarks
A multi-voice TTS system trained with an emphasis on quality
Localized watermarking for AI-generated speech audios, with SOTA on robustness and very fast detector
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
[Interspeech 2024] Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation
Taming Transformers for High-Resolution Image Synthesis
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate
Ultra-low bitrate neural audio codec (0.31~1.40 kbps) with a better semantic in the latent space.
PyTorch implementation of Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities.
Official implementation for our paper "Audio Mamba: Selective State Spaces for Self-Supervised Audio Representations"
Official repository for the paper Singing Voice Graph Modeling for SingFake Detection (Interspeech 2024).
Unofficial Pytorch Lightning Implementation of "A New Framework for CNN-Based Speech Enhancement in the Time Domain"
TCNN Temporal convolutional neural network for real-time speech enhancement in the time domain
GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型
A generative speech model for daily dialogue.
Training code for FAcodec presented in NaturalSpeech3
Speech, Language, Audio, Music Processing with Large Language Model
Official repository for the paper Multimodal Transformer Distillation for Audio-Visual Synchronization (ICASSP 2024).
Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supportin…