-
National Taiwan University
- Seattle, WA, US
- https://hbwu-ntu.github.io/
- in/haibin-wu-479a39252
- https://scholar.google.com/citations?user=-bB-WHEAAAAJ&hl=zh-TW
Highlights
- Pro
Stars
AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
Robust Speech Recognition via Large-Scale Weak Supervision
为GPT/GLM等LLM大语言模型提供实用化交互接口,特别优化论文阅读/润色/写作体验,模块化设计,支持自定义快捷按钮&函数插件,支持Python和C++等项目剖析&自译解功能,PDF/LaTex论文翻译&总结功能,支持并行问询多种LLM模型,支持chatglm3等本地模型。接入通义千问, deepseekcoder, 讯飞星火, 文心一言, llama2, rwkv, claude2, m…
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Official Code for DragGAN (SIGGRAPH 2023)
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
A generative speech model for daily dialogue.
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
Instant voice cloning by MIT and MyShell.
Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)
JARVIS, a system to connect LLMs with ML community. Paper: https://arxiv.org/pdf/2303.17580.pdf
Open-Sora: Democratizing Efficient Video Production for All
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
ChatGLM3 series: Open Bilingual Chat LLMs | 开源双语对话语言模型
RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference,…
An open-source tool-augmented conversational language model from Fudan University
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Faster Whisper transcription with CTranslate2