linzhenyuyuchen

Follow

🎯

Focusing

LinZhenYu linzhenyuyuchen

🎯

Focusing

Follow

Autonomous Driving, NeRF, 3D Reconstruction, Computer Vision and Medical Image Analysis.

25 followers · 59 following

Achievements

Achievements

Starred repositories

YiyangZhou / LURE

[ICLR 2024] Analyzing and Mitigating Object Hallucination in Large Vision-Language Models

Python 124 5 Updated Apr 30, 2024

showlab / Awesome-MLLM-Hallucination

📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).

315 9 Updated Aug 20, 2024

EvolvingLMMs-Lab / lmms-eval

Accelerating the development of large multimodal models (LMMs) with lmms-eval

Python 1,252 79 Updated Aug 20, 2024

OpenGVLab / InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Python 5,137 403 Updated Aug 20, 2024

dvlab-research / MGM

Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"

Python 3,145 277 Updated May 4, 2024

PKU-YuanGroup / LanguageBind

【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment

Python 659 48 Updated Mar 25, 2024

archiki / RepARe

Python 19 Updated Oct 10, 2023

thunlp / LLaVA-UHD

LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images

Python 286 14 Updated Aug 18, 2024

DirtyHarryLYL / LLM-in-Vision

Recent LLM-based CV and related works. Welcome to comment/contribute!

815 33 Updated Jun 5, 2024

OFA-Sys / ONE-PEACE

A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

Python 917 56 Updated Jun 27, 2024

yunlong10 / Awesome-LLMs-for-Video-Understanding

🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.

1,144 64 Updated Aug 21, 2024

lichao-sun / Mora

Mora: More like Sora for Generalist Video Generation

Python 1,464 91 Updated Jun 21, 2024

WongKinYiu / yolov9

Implementation of paper - YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information

Python 8,738 1,349 Updated Aug 9, 2024

X-PLUG / mPLUG-DocOwl

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

Python 1,207 75 Updated Jul 16, 2024

open-mmlab / mmyolo

OpenMMLab YOLO series toolbox and benchmark. Implemented RTMDet, RTMDet-Rotated,YOLOv5, YOLOv6, YOLOv7, YOLOv8,YOLOX, PPYOLOE, etc.

Python 2,891 526 Updated Jul 14, 2024

Megvii-BaseDetection / YOLOX

YOLOX is a high-performance anchor-free YOLO, exceeding yolov3~v5 with MegEngine, ONNX, TensorRT, ncnn, and OpenVINO supported. Documentation: https://yolox.readthedocs.io/

Python 9,261 2,179 Updated Jul 30, 2024

AILab-CVC / YOLO-World

[CVPR 2024] Real-Time Open-Vocabulary Object Detection

Python 4,167 405 Updated Jul 30, 2024

lichengunc / refer

Referring Expression Datasets API

Jupyter Notebook 442 79 Updated Apr 13, 2021

Yuliang-Liu / Monkey

【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models

Python 1,656 118 Updated Aug 15, 2024

FoundationVision / GenerateU

[CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detection

Python 122 6 Updated Mar 25, 2024

xai-org / grok-1

Grok open release

Python 49,355 8,325 Updated Aug 7, 2024

liguodongiot / llm-action

本项目旨在分享大模型相关技术原理以及实战经验。

HTML 8,661 843 Updated Aug 11, 2024

pkunlp-icler / FastV

[ECCV 2024] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

Python 185 9 Updated Aug 12, 2024

wayveai / LingoQA

Official GitHub repository for the paper "LingoQA: Video Question Answering for Autonomous Driving"

Python 99 3 Updated Mar 26, 2024

OpenDriveLab / ELM

[ECCV 2024] Embodied Understanding of Driving Scenarios

Python 125 8 Updated May 9, 2024

christophschuhmann / improved-aesthetic-predictor

CLIP+MLP Aesthetic Score Predictor

Python 842 87 Updated Jul 1, 2024

THUDM / CogCoM

Jupyter Notebook 143 9 Updated Jul 5, 2024

snap-research / Panda-70M

[CVPR 2024] Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

Python 477 19 Updated Jun 26, 2024

WisconsinAIVision / ViP-LLaVA

[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts

Python 268 20 Updated Jul 17, 2024

neelsjain / NEFTune

Official repository of NEFTune: Noisy Embeddings Improves Instruction Finetuning

Python 358 18 Updated May 17, 2024

Starred topics

mimic-iii