Stars
A comprehensive survey on Internal Consistency and Self-Feedback in Large Language Models.
Dynamic 2D Gaussians: Geometrically accurate radiance fields for dynamic objects
A system for agentic LLM-powered data processing and ETL
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Implementation of paper 'Towards Coherent Image Inpainting Using Denoising Diffusion Implicit Models'
High accuracy RAG for answering questions from scientific documents with citations
GeoCalib: Learning Single-image Calibration with Geometric Optimization (ECCV 2024)
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
A project that optimizes OWL-ViT for real-time inference with NVIDIA TensorRT.
Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
Clipora is a powerful toolkit for fine-tuning OpenCLIP models using Low Rank Adapters (LoRA).
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
A tool to modify ONNX models in a visualization fashion, based on Netron and Flask.
Code for the paper "Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models"
Code for *Eventfulness for Interactive Video Alignment*
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Accepted as [NeurIPS 2024] Spotlight Presentation Paper
A tool to project equirectangular panorama into perspective images
code for "Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion"
A tool for compiling trained SKLearn models into other representations (such as SQL, Sympy or Excel formulas)
Local models support for Microsoft's graphrag using ollama (llama3, mistral, gemma2 phi3)- LLM & Embedding extraction
[NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of existing MLLMs to comprehend long multimodal documents.
[NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation
Official Implementation of CVPR24 highligt paper: Matching Anything by Segmenting Anything
LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images
Source code of paper "NVS-Solver: Video Diffusion Model as Zero-Shot Novel View Synthesizer"
[BMVC2021] The first image composition assessment dataset. Used in the paper "Image Composition Assessment with Saliency-augmented Multi-pattern Pooling". Useful for image composition assessment, i…