- Guildford, UK
-
06:53
(UTC +01:00) - https://orcid.org/0000-0002-2604-439X
- @anindmondal
- https://mondalanindya.github.io/
- in/anindyamondal2001
Highlights
Lists (5)
Sort Name ascending (A-Z)
Stars
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
[NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"
A curated publication list on open vocabulary semantic segmentation and related area (e.g. zero-shot semantic segmentation) resources..
Project page for "OmniCount: Multi-label Object Counting with Semantic-Geometric Priors"
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
Synthesize, Rank, Count; A Method for unsupervised crowd counting using latent diffusion models
collection of diffusion model papers categorized by their subareas
🧙🏻♂️A list of papers curated for you to dive into the Awesome Radiance Field-based 3D Editing.
[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation
[CVPR 2024] An End-to-End Tile-Based Framework for High-Resolution Monocular Metric Depth Estimation
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
Reading list for Multimodal Large Language Models
Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)
CLIP Surgery for Better Explainability with Enhancement in Open-Vocabulary Tasks
"Automatically Discovering and Learning New Visual Categories with Ranking Statistics" by Kai Han, Sylvestre-Alvise Rebuffi, Sebastien Ehrhardt, Andrea Vedaldi, Andrew Zisserman (ICLR 2020)
A list of papers that studies Novel Class Discovery
Includes FSC-147-D and the code for training and testing the CounTX model from the paper Open-world Text-specified Object Counting.
[ACM MM23] CLIP-Count: Towards Text-Guided Zero-Shot Object Counting
[CVPR 2023] CrowdCLIP: Unsupervised Crowd Counting via Vision-Language Model
This repo lists relevant papers summarized in our survey paper: A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models.
Actor-agnostic Multi-label Action Recognition with Multi-modal Query [ICCVW '23]
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
The repository is the code for the paper "End-to-End Video Object Detection with Spatial-TemporalTransformers"
A curated list of awesome temporal action segmentation resources.
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.