Highlights
- Pro
Stars
The official implementation for the paper [ODTrack: Online Dense Temporal Token Learning for Visual Tracking].
Video Grounding and Captioning
[CVPR 2024 - Oral, Best Paper Award Candidate] Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation
This repository contains the necessary tools for RGBT tracking, including datasets(GTOT, RGBT234, LasHeR), evaluation tools, visualization tools, and results of existing works.
[NeurIPS 2024] VastTrack: Vast Category Visual Object Tracking
MV-VTON: Multi-View Virtual Try-On with Diffusion Models
[IEEE TCYB 2023] The first large-scale tracking dataset by fusing RGB and Event cameras.
A Large-scale High-diversity Benchmark for RGBT Tracking
Dataset and Code for the paper "DepthTrack: Unveiling the Power of RGBD Tracking" (ICCV2021), and "Depth-only Object Tracking" (BMVC2021)
The official implementation for the CVPR 2023 paper Joint Visual Grounding and Tracking with Natural Language Specification.
Paint by Example: Exemplar-based Image Editing with Diffusion Models
Diffusion Model-Based Image Editing: A Survey (arXiv)
A visual object tracking paper list, articles related to visual object tracking have been documented.
Diffusion-TTA improves pre-trained discriminative models such as image classifiers or segmentors using pre-trained generative models.
Code implementation of our NeurIPS 2023 paper: Vocabulary-free Image Classification
An open source implementation of CLIP.
This repository contains the official implementation of the research paper, "MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training" CVPR 2024
Code Release of F-LMM: Grounding Frozen Large Multimodal Models
[ECCV2024] This is an official implementation for "PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model"
[NeurIPS'24 Spotlight] EVE: Encoder-Free Vision-Language Models
[ICCV 2023] VPD is a framework that leverages the high-level and low-level knowledge of a pre-trained text-to-image diffusion model to downstream visual perception tasks.
[NeurIPS'23] Emergent Correspondence from Image Diffusion
Refine high-quality datasets and visual AI models
TypeChat is a library that makes it easy to build natural language interfaces using types.
A personal investigative project to track the latest progress in the field of multi-modal object tracking.