Skip to content
View linhaojia13's full-sized avatar
🎯
Focusing
🎯
Focusing
  • Xiamen University
  • Xiamen
Block or Report

Block or report linhaojia13

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A generative speech model for daily dialogue.

Python 28,578 3,123 Updated Aug 1, 2024

Long Context Transfer from Language to Vision

Python 256 13 Updated Jul 28, 2024
14 Updated Jul 29, 2024

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Python 595 37 Updated Jul 29, 2024

The official repository of "Video assistant towards large language model makes everything easy"

Python 195 13 Updated Feb 22, 2024

✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

328 11 Updated Jun 18, 2024
Python 28 2 Updated Jul 9, 2024

FreeVA: Offline MLLM as Training-Free Video Assistant

Python 38 Updated Jun 9, 2024
HTML 60 6 Updated May 10, 2024

Attempt @ reproducing NVIDIA's paper using Claude 3 and Grounding Dino.

Jupyter Notebook 5 Updated May 27, 2024

Coding in Neovim elegently

Lua 58 6 Updated Jun 3, 2024

SimpleNvim: Unleash the Power of Neovim with Effortless Elegance and Boundless Customization ..

Lua 63 5 Updated Aug 1, 2024

PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"

Python 484 31 Updated Jan 7, 2024

[ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"

Python 506 25 Updated Jul 25, 2024

Dataset pruning for ImageNet and LAION-2B.

Python 58 4 Updated Jul 5, 2024

Official PyTorch Implementation code for realizing the technical part of CoLLaVO: Crayon Large Language and Vision mOdel to significantly improve zero-shot vision language performances (ACL 2024 Fi…

Python 87 10 Updated Jun 28, 2024

Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning

Python 79 5 Updated Jul 6, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 24,032 3,461 Updated Aug 1, 2024

[ECCV 2024] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

Python 177 8 Updated Jul 4, 2024

Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)

Python 25,204 2,905 Updated Apr 22, 2024

HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data (Accepted by CVPR 2024)

Python 36 Updated Jul 16, 2024

The official GitHub page for ''What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Instruction Tuning''

Python 18 Updated Nov 10, 2023

【ECCV2024】The official repo of Griffon series

Python 87 5 Updated Jul 4, 2024

LLaVA-HR: High-Resolution Large Language-Vision Assistant

Python 193 9 Updated May 29, 2024

[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"

Python 733 42 Updated Apr 15, 2024

A family of lightweight multimodal models.

Python 830 64 Updated Jul 31, 2024

【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models

Python 1,601 111 Updated Jul 14, 2024

A flexible and efficient codebase for training visually-conditioned language models (VLMs)

Python 371 136 Updated Jul 4, 2024

Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization

Python 50 3 Updated Jan 30, 2024
Next