-
National Taiwan University
- Seattle, WA, US
- https://hbwu-ntu.github.io/
- in/haibin-wu-479a39252
- https://scholar.google.com/citations?user=-bB-WHEAAAAJ&hl=zh-TW
Highlights
- Pro
Block or Report
Block or report hbwu-ntu
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseStars
Language
Sort by: Recently starred
PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838
A simple pip-installable Python tool to generate your own HTML citation world map from your Google Scholar ID.
Implementation of Autoregressive Diffusion in Pytorch
Fake speech detection with the CodecFake dataset
为GPT/GLM等LLM大语言模型提供实用化交互接口,特别优化论文阅读/润色/写作体验,模块化设计,支持自定义快捷按钮&函数插件,支持Python和C++等项目剖析&自译解功能,PDF/LaTex论文翻译&总结功能,支持并行问询多种LLM模型,支持chatglm3等本地模型。接入通义千问, deepseekcoder, 讯飞星火, 文心一言, llama2, rwkv, claude2, m…
Audio processing by using pytorch 1D convolution network
AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension
A lightweight package for some common metrics used in speech
Utilities intended for use with Llama models.
AudioBench: A Universal Benchmark for Audio Large Language Models
An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)
LongRoPE is a novel method that can extends the context window of pre-trained LLMs to an impressive 2048k tokens.
Implementation of rectified flow and some of its followup research / improvements in Pytorch
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
Fine-tune and evaluate Whisper models for Automatic Speech Recognition (ASR) on custom datasets or datasets from huggingface.
MiniSora: A community aims to explore the implementation path and future development direction of Sora.
MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.
MARS5 speech model (TTS) from CAMB.AI
This is the official implementation of the SEMamba paper.
Implementation of E2-TTS, "Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS", in Pytorch
Multilingual Voice Understanding Model
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Enjoy the magic of Diffusion models!
OpenDiT: An Easy, Fast and Memory-Efficient System for DiT Training and Inference
FACodec: Speech Codec with Attribute Factorization used for NaturalSpeech 3
This repository contains the SpeechBrain Benchmarks