Block or Report
Block or report linhaojia13
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseStars
Sort by: Recently starred
A generative speech model for daily dialogue.
Long Context Transfer from Language to Vision
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
The official repository of "Video assistant towards large language model makes everything easy"
✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Attempt @ reproducing NVIDIA's paper using Claude 3 and Grounding Dino.
SimpleNvim: Unleash the Power of Neovim with Effortless Elegance and Boundless Customization ..
PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"
[ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"
Dataset pruning for ImageNet and LAION-2B.
Official PyTorch Implementation code for realizing the technical part of CoLLaVO: Crayon Large Language and Vision mOdel to significantly improve zero-shot vision language performances (ACL 2024 Fi…
Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning
A high-throughput and memory-efficient inference and serving engine for LLMs
[ECCV 2024] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)
HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data (Accepted by CVPR 2024)
RUCAIBox / ComVint
Forked from Richar-Du/ComVintThe official GitHub page for ''What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Instruction Tuning''
LLaVA-HR: High-Resolution Large Language-Vision Assistant
[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"
【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models
A flexible and efficient codebase for training visually-conditioned language models (VLMs)
Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization