Stars
Language
Sort by: Recently starred
SGLang is a fast serving framework for large language models and vision language models.
[Interspeech 2024] LiteFocus is a tool designed to accelerate diffusion-based TTA model, now implemented with the base model AudioLDM2.
A More Fair and Comprehensive Comparison between KAN and MLP
Vico: Compositional Video Generation as Flow Equalization
Reference implementation of Megalodon 7B model
To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.
Explore the Limits of Omni-modal Pretraining at Scale
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
Open-source evaluation toolkit of large vision-language models (LVLMs), support ~100 VLMs, 30+ benchmarks
Code for the paper "Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models"
Accelerating the development of large multimodal models (LMMs) with lmms-eval
A collection of papers on the topic of ``Computer Vision in the Wild (CVinW)''
Autoregressive Model Beats Diffusion: π¦ Llama for Scalable Image Generation
Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching
FinRobot: An Open-Source AI Agent Platform for Financial Applications using LLMs π π π
An efficient pure-PyTorch implementation of Kolmogorov-Arnold Network (KAN).
[Mamba-Survey-2024] Paper list for State-Space-Model/Mamba and it's Applications
Code for paper "Unsegment Anything by Simulating Deformation" (CVPR 2024)
Experiencing lightning fast (~1s) and accurate drag-based image editing
Multilingual Medicine: Model, Dataset, Benchmark, Code
Adapting LLaMA Decoder to Vision Transformer
[SIGGRAPH'24] 2D Gaussian Splatting for Geometrically Accurate Radiance Fields
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (Vβ¦
[CVPR 2024] Code release for TransNeXt model
βοΈ Sailor: Open Language Models for South-East Asia