Skip to content
View linzhenyuyuchen's full-sized avatar
🎯
Focusing
🎯
Focusing
Block or Report

Block or report linzhenyuyuchen

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Accelerating the development of large multimodal models (LMMs) with lmms-eval

Python 1,113 62 Updated Jul 20, 2024

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4V. 接近GPT-4V表现的可商用开源多模态对话模型

Python 4,256 324 Updated Jul 19, 2024

Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"

Python 3,102 277 Updated May 4, 2024

【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment

Python 628 48 Updated Mar 25, 2024
Python 19 Updated Oct 10, 2023

LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images

Python 270 14 Updated Apr 14, 2024

Recent LLM-based CV and related works. Welcome to comment/contribute!

799 33 Updated Jun 5, 2024

A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

Python 897 55 Updated Jun 27, 2024

🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.

988 55 Updated Jul 20, 2024

Mora: More like Sora for Generalist Video Generation

Python 1,441 91 Updated Jun 21, 2024

Implementation of paper - YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information

Python 8,626 1,314 Updated Jul 18, 2024

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

Python 1,160 68 Updated Jul 16, 2024

OpenMMLab YOLO series toolbox and benchmark. Implemented RTMDet, RTMDet-Rotated,YOLOv5, YOLOv6, YOLOv7, YOLOv8,YOLOX, PPYOLOE, etc.

Python 2,858 525 Updated Jul 14, 2024

YOLOX is a high-performance anchor-free YOLO, exceeding yolov3~v5 with MegEngine, ONNX, TensorRT, ncnn, and OpenVINO supported. Documentation: https://yolox.readthedocs.io/

Python 9,192 2,167 Updated Jul 19, 2024

[CVPR 2024] Real-Time Open-Vocabulary Object Detection

Python 4,001 388 Updated Jul 17, 2024

Referring Expression Datasets API

Jupyter Notebook 429 79 Updated Apr 13, 2021

【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models

Python 1,578 110 Updated Jul 14, 2024

[CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detection

Python 120 6 Updated Mar 25, 2024

Grok open release

Python 49,186 8,311 Updated May 29, 2024

本项目旨在分享大模型相关技术原理以及实战经验。

HTML 7,987 777 Updated Jul 17, 2024

[ECCV 2024] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

Python 173 8 Updated Jul 4, 2024

Official GitHub repository for the paper "LingoQA: Video Question Answering for Autonomous Driving"

Python 92 3 Updated Mar 26, 2024

[ECCV 2024] Embodied Understanding of Driving Scenarios

Python 115 7 Updated May 9, 2024

CLIP+MLP Aesthetic Score Predictor

Python 801 86 Updated Jul 1, 2024
Jupyter Notebook 140 9 Updated Jul 5, 2024

[CVPR 2024] Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

Python 450 15 Updated Jun 26, 2024

[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts

Python 238 19 Updated Jul 17, 2024

Official repository of NEFTune: Noisy Embeddings Improves Instruction Finetuning

Python 349 18 Updated May 17, 2024

Gradio demo used in our Osprey:Pixel Understanding with Visual Instruction Tuning.

Python 14 2 Updated Dec 19, 2023

[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"

Python 727 42 Updated Apr 15, 2024
Next