Stars
The code of the paper "NExT-Chat: An LMM for Chat, Detection and Segmentation".
🤠 Agent-as-a-Judge and DevAI dataset
This is the public repository for the Poster Abstract: Realistic Multiuser, Multimodal (IMU, Acoustic) HAR Data Generation through Single User Data Augmentation accepted in ACM/IEEE IPSN 2022.
3-layer-CNN and ResNet with OPPORTUNITY dataset, PAMAP2 dataset, UCI-HAR dataset, UniMiB-SHAR dataset, USC-HAD dataset, and WISDM dataset.
Meta-Transformer for Unified Multimodal Learning
[CVPR 2024] OneLLM: One Framework to Align All Modalities with Language
Official repository for the NuScenes-MQA. This paper is accepted by LLVA-AD Workshop at WACV 2024.
[ICCV 2023] GeoMIM: towards better 3d knowledge transfer via masked image modeling for multi-view 3d understanding
[ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of the Open World"
MixMIM: Mixed and Masked Image Modeling for Efficient Visual Representation Learning
[AAAI 2024] NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for Autonomous Driving Scenario.
Official PyTorch implementation of FocalFormer3D [ICCV 2023]
Project Page for "LISA: Reasoning Segmentation via Large Language Model"
[ICCV 2023] Code for NeRF-Det: Learning Geometry-Aware Volumetric Representation for Multi-View 3D Object Detection
Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
[ICCV 2023] Temporal Enhanced Training of Multi-view 3D Object Detector via Historical Object Prediction
Official code base of the BEVDet series .
The official implementation of ICLR2021 paper "Improve Object Detection with Feature-based Knowledge Distillation: Towards Accurate and Efficient Detectors".
Codes for VPGTrans: Transfer Visual Prompt Generator across LLMs. VL-LLaMA, VL-Vicuna.
Code release for "Masked-attention Mask Transformer for Universal Image Segmentation"
[CVPR 2023 Highlight] InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
This is the official code for MobileSAM project that makes SAM lightweight for mobile applications and beyond!