yuanrr

Bowen Yuan yuanrr

Ph.D student, focusing on image and video understanding, i.e., visual question answering, video question answering, etc.

3 followers · 28 following

https://orcid.org/0000-0002-8051-3070

Achievements

Block or Report

Block or report yuanrr

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

bfshi / scaling_on_scales

When do we not need larger vision models?

Python 265 8 Updated Jul 12, 2024

magic-research / PLLaVA

Official repository for the paper PLLaVA

Python 486 30 Updated Jul 15, 2024

bytedance / tarsier

Python 62 4 Updated Jul 8, 2024

PhoenixZ810 / MG-LLaVA

Official repository for paper MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning(https://arxiv.org/abs/2406.17770).

Python 113 2 Updated Jul 16, 2024

snumprlab / SRT

i-SRT:Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective Judgement

Python 9 1 Updated Jul 4, 2024

wangclnlp / Vision-LLM-Alignment

This repo contains the codes for supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) designed for vision LLMs.

Python 19 1 Updated Jul 10, 2024

IVGSZ / Flash-VStream

Forked from IVG-SZ/Flash-VStream

This is the official implementation of "Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams"

Python 56 3 Updated Jul 8, 2024

showlab / Awesome-MLLM-Hallucination

📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).

268 8 Updated Jul 3, 2024

YiyangZhou / POVID

[Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning

Python 55 1 Updated Apr 30, 2024

yihedeng9 / STIC

Enhancing Large Vision Language Models with Self-Training on Image Comprehension.

Python 44 2 Updated May 31, 2024

JUNJIE99 / MLVU

🔥🔥MLVU: Multi-task Long Video Understanding Benchmark

Python 91 Updated Jul 2, 2024

NielsRogge / Transformers-Tutorials

This repository contains demos I made with the Transformers library by HuggingFace.

Jupyter Notebook 8,575 1,346 Updated Jul 8, 2024

chuangchuangtan / LLaVA-NeXT-Image-Llama3-Lora

LLaVA-NeXT-Image-Llama3-Lora, Modified from https://github.com/arielnlee/LLaVA-1.6-ft

Python 19 1 Updated Jul 1, 2024

alibaba / conv-llava

Python 84 3 Updated Jul 6, 2024

xiaoachen98 / Open-LLaVA-NeXT

An open-source implementation of LLaVA-NeXT.

Python 133 4 Updated Jun 12, 2024

SUSTechBruce / LOOK-M

Official implementation of "LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference"

Python 45 2 Updated Jul 2, 2024

boheumd / MA-LMM

(2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

Python 186 23 Updated Jun 14, 2024

QQ-MM / Video-CCAM

A lightweight flexible Video-MLLM developed by TencentQQ Multimedia Research Team.

Python 19 1 Updated Jul 16, 2024

google-research / scenic

Scenic: A Jax Library for Computer Vision Research and Beyond

Python 3,150 419 Updated Jul 15, 2024

EvolvingLMMs-Lab / LongVA

Long Context Transfer from Language to Vision

Python 221 11 Updated Jul 12, 2024

kyegomez / Infini-attention

Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTORCH

Python 45 3 Updated Jul 1, 2024

HITsz-TMG / UMOE-Scaling-Unified-Multimodal-LLMs

The codes about "Uni-MoE: Scaling Unified Multimodal Models with Mixture of Experts"

Python 731 33 Updated Jul 11, 2024

UCSC-VLAA / Recap-DataComp-1B

This is the official repository of our paper "What If We Recaption Billions of Web Images with LLaMA-3 ?"

106 1 Updated Jun 13, 2024

deepcs233 / Visual-CoT

Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning

Python 71 5 Updated Jul 6, 2024

longvideobench / LongVideoBench

Official Dataloader and Evaluation Scripts for LongVideoBench.

Python 10 Updated Jun 16, 2024

OswaldHe / HMT-pytorch

Official Implementation of "HMT: Hierarchical Memory Transformer for Long Context Language Processing"

Python 52 2 Updated May 22, 2024

yonseivnl / vlm-rlaif

ACL'24 Main track

Python 21 3 Updated Jul 11, 2024

ShareGPT4Omni / ShareGPT4Video

An official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Python 1,157 33 Updated Jul 8, 2024

NVlabs / VILA

VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)

Python 1,009 71 Updated Jul 7, 2024

UNITES-Lab / MC-SMoE

[ICLR 2024 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"

Python 61 9 Updated Jun 6, 2024

Bowen Yuan yuanrr

Block or report yuanrr

Starred repositories

Artificial Intelligence

Computer vision

Machine learning