- San Francisco, California
- https://henryzhou7.github.io
Stars
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…
A feature-rich command-line audio/video downloader
Inference and training library for high-quality TTS models.
HellaSwag: Can a Machine _Really_ Finish Your Sentence?
Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
simple and efficient python implemention of a series of adaptive filters. including time domain adaptive filters(lms、nlms、rls、ap、kalman)、nonlinear adaptive filters(volterra filter、functional link a…
Python high-level interface and ctypes-based bindings for PulseAudio (libpulse)
MacOS system extension that allows applications to pass audio to other applications. Soundflower works on macOS Catalina.
Whisper realtime streaming for long speech-to-text transcription and translation
Faster Whisper transcription with CTranslate2
MARS5 speech model (TTS) from CAMB.AI
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translatio…
SpeechGPT Series: Speech Large Language Models
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
Graphic notes on Gilbert Strang's "Linear Algebra for Everyone"
🌟 The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming
JARVIS, a system to connect LLMs with ML community. Paper: https://arxiv.org/pdf/2303.17580.pdf
🧑🏫 60+ Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), ga…