- San Francisco, CA
Stars
๐ NotebookMLX - An Open Source version of NotebookLM (Ported NotebookLlama)
VAE modified from Descript Audio Codec, which replaces the RVQ with VAE
Explorations into improving ViTArc with Slot Attention
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS (E2 TTS) in MLX
SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling
A concise but complete full-attention transformer with a set of promising experimental features from various papers
Implementation of E2-TTS, "Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS", in Pytorch
Implementation of ๐ Ring Attention, from Liu et al. at Berkeley AI, in Pytorch
Implementation of Voicebox, new SOTA Text-to-speech network from MetaAI, in Pytorch
Implementation of SoundStorm, Efficient Parallel Audio Generation from Google Deepmind, in Pytorch
Explorations into some recent techniques surrounding speculative decoding