Stars
(ICML 2024) PyTorch implementation of "Self-Attention through Kernel-Eigen Pair Sparse Variational Gaussian Processes"
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Project Page for "LISA: Reasoning Segmentation via Large Language Model"
[ICCV 2023] SurroundOcc: Multi-camera 3D Occupancy Prediction for Autonomous Driving
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
Cross-Modality Knowledge Distillation Network for Monocular 3D Object Detection (ECCV 2022 Oral)
AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
A data generation pipeline for creating semi-realistic synthetic multi-object videos with rich annotations such as instance segmentation masks, depth maps, and optical flow.
AdelaiDet is an open source toolbox for multiple instance-level detection and recognition tasks.
[ICCV 2019] Monocular depth estimation from a single image
An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
PIE: A Large-Scale Dataset and Models for Pedestrian Intention Estimation and Trajectory Prediction
Open-source simulator for autonomous driving research.
OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark
Official implementation of the paper "You Only Need Adversarial Supervision for Semantic Image Synthesis" (ICLR 2021)
[ECCV 2022] This is the official implementation of BEVFormer, a camera-only framework for autonomous driving perception, e.g., 3D object detection and semantic map segmentation.
Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
Beyond the Imitation Game collaborative benchmark for measuring and extrapolating the capabilities of language models
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image