Skip to content
View yuanrr's full-sized avatar
Block or Report

Block or report yuanrr

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

When do we not need larger vision models?

Python 265 8 Updated Jul 12, 2024

Official repository for the paper PLLaVA

Python 486 30 Updated Jul 15, 2024
Python 62 4 Updated Jul 8, 2024

Official repository for paper MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning(https://arxiv.org/abs/2406.17770).

Python 113 2 Updated Jul 16, 2024

i-SRT:Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective Judgement

Python 9 1 Updated Jul 4, 2024

This repo contains the codes for supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) designed for vision LLMs.

Python 19 1 Updated Jul 10, 2024

This is the official implementation of "Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams"

Python 56 3 Updated Jul 8, 2024

📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).

268 8 Updated Jul 3, 2024

[Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning

Python 55 1 Updated Apr 30, 2024

Enhancing Large Vision Language Models with Self-Training on Image Comprehension.

Python 44 2 Updated May 31, 2024

🔥🔥MLVU: Multi-task Long Video Understanding Benchmark

Python 91 Updated Jul 2, 2024

This repository contains demos I made with the Transformers library by HuggingFace.

Jupyter Notebook 8,575 1,346 Updated Jul 8, 2024

LLaVA-NeXT-Image-Llama3-Lora, Modified from https://github.com/arielnlee/LLaVA-1.6-ft

Python 19 1 Updated Jul 1, 2024
Python 84 3 Updated Jul 6, 2024

An open-source implementation of LLaVA-NeXT.

Python 133 4 Updated Jun 12, 2024

Official implementation of "LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference"

Python 45 2 Updated Jul 2, 2024

(2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

Python 186 23 Updated Jun 14, 2024

A lightweight flexible Video-MLLM developed by TencentQQ Multimedia Research Team.

Python 19 1 Updated Jul 16, 2024

Scenic: A Jax Library for Computer Vision Research and Beyond

Python 3,150 419 Updated Jul 15, 2024

Long Context Transfer from Language to Vision

Python 221 11 Updated Jul 12, 2024

Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTORCH

Python 45 3 Updated Jul 1, 2024

The codes about "Uni-MoE: Scaling Unified Multimodal Models with Mixture of Experts"

Python 731 33 Updated Jul 11, 2024

This is the official repository of our paper "What If We Recaption Billions of Web Images with LLaMA-3 ?"

106 1 Updated Jun 13, 2024

Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning

Python 71 5 Updated Jul 6, 2024

Official Dataloader and Evaluation Scripts for LongVideoBench.

Python 10 Updated Jun 16, 2024

Official Implementation of "HMT: Hierarchical Memory Transformer for Long Context Language Processing"

Python 52 2 Updated May 22, 2024

ACL'24 Main track

Python 21 3 Updated Jul 11, 2024

An official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Python 1,157 33 Updated Jul 8, 2024

VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)

Python 1,009 71 Updated Jul 7, 2024

[ICLR 2024 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"

Python 61 9 Updated Jun 6, 2024
Next