Stars
TraDiffusion: Trajectory-Based Training-Free Image Generation
[ECCV 2024] Empowering 3D Visual Grounding with Reasoning Capabilities
INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model
Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'
PyTorch implementation of the paper `Toward Open-set Human Object Interaction Detection' (AAAI2024)
[CVPR'2024 Highlight] Official PyTorch implementation of the paper "VTimeLLM: Empower LLM to Grasp Video Moments".
[ECCV 2024] Official implementation of the paper "Semantic-SAM: Segment and Recognize Anything at Any Granularity"
📚 A collection of papers about Referring Image Segmentation.
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
[ICML2024]The official implementation of SemiRES in PyTorch.
Papers related to remote sensing in CVPR 2024
Code Release of F-LMM: Grounding Frozen Large Multimodal Models
LaVIT: Empower the Large Language Model to Understand and Generate Visual Content
🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.
About Official repository for "X-Oscar: A Progressive Framework for High-quality Text-guided 3D Animatable Avatar Generation"
[ICML 2024] Official code repository for 3D embodied generalist agent LEO
Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resources
MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators
[CVPR 24] MaskClustering: View Consensus based Mask Graph Clustering for Open-Vocabulary 3D Instance Segmentation
【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models