Stars
Image forgery recognition algorithm
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
Python version of the Playwright testing and automation library.
Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI
A plotting tool that outputs Line Rider maps, so you can watch a man on a sled scoot down your loss curves. 🎿
[Official Implementation] Acoustic Autoregressive Modeling 🔥
Finetune Llama 3.1, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory
LaTeXML: a TeX and LaTeX to XML/HTML/ePub/MathML translator.
A simple way to keep track of an Exponential Moving Average (EMA) version of your pytorch model
Analyzing and Improving the Training Dynamics of Diffusion Models (EDM2)
Language-Codec: Reducing the Gaps Between Discrete Codec Representation and Speech Language Models
VAE modified from Descript Audio Codec, which replaces the RVQ with VAE
[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly …
A family of diffusion models for text-to-audio generation.
PyTorch Implementation of AudioLCM (ACM-MM'24): a efficient and high-quality text-to-audio generation with latent consistency model.
A library for efficient similarity search and clustering of dense vectors.
Multi-level network clustering based on the Map Equation
SimPO: Simple Preference Optimization with a Reference-Free Reward
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
A generative speech model for daily dialogue.
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
Instant voice cloning by MIT and MyShell.
Official implementation code of the paper <AnyText: Multilingual Visual Text Generation And Editing>
PyTorch implementation of normalizing flow models