GranD - Grounding Anything Dataset 🚀

The Grounding-anything Dataset (GranD) dataset offers densely annotated data, acquired through an automated annotation pipeline that leverages state-of-the-art (SOTA) vision and V-L models. This documentation covers how to download the GranD dataset and a guide to the automated annotation pipeline used to create GranD.

Download GranD 📂

Annotations: MBZUAI/GranD
Images: Download GranD utilizes images from the SAM dataset.

Note: Please note that annotations are being uploaded incrementally and more parts will be available soon.

Preparing the Pretraining Annotations from GranD 🛠️

After downloading the GranD annotations, utilize the scripts below to transform them into GLaMM pretraining data, or to prepare them for your specific tasks.

For object-level tasks like object detection, semantic segmentation: prepare_object_lvl_data.py
For image-level captioning and caption grounding: prepare_grand_caption_grounding.py
For referring expression generation and referring expression segmentation: prepare_grand_referring_expression

The above scripts generate annotations in JSON format. To convert these for use in pretraining datasets requiring LMDB format, please use to the following scripts:

To convert to lmdb: get_txt_for_lmdb.py
To extract file names in txt format: get_txt_for_lmdb.py

GranD Automated Annotation Pipeline

GranD is a comprehensive, multi-purpose image-text dataset offering a range of contextual information, from fine-grained to high-level details. The pipeline contains four distinct levels. The code for the four levels are provided in: GranD

More detailed information:

To run the entire pipeline: run_pipeline.sh
To set up the environments detailed in run_pipeline.sh refer to : environments
Level-1 : Object Localization and Attributes
- Landmark Categorization: landmark
- Depth Map Estimation: Midas Depth Estimation
- Image Tagging: RAM Tag2Text Tagging
- Standard Object Detection: CO-DETR OD, EVA OD
- Open Vocabulary Object Detection: OWL-ViT OVD, POMP OVD
- Attribute Detection and Grounding: Attribute & Grounidng GRiT
- Open Vocabulary Classification: OV Classification OV-SAM
- Combine the predictions: Merging
- Generate Level-1 Scene Graph: Level-1 Scene Graph
Level-2: Relationships
- Captioning: BLIP-2 Captioning, LLaVA Captioning
- Grounding Short Captions: MDETR Grounding
- Combine the predictions: Merging
- Generate Level-2 Scene Graph and Update Level-1: Level-2 Scene Graph
- Enrich Attributes: GPT4-RoI Attributes
- Label Assignment: EVA-CLIP Label Assignment
Level-3: Scene Graph and Dense Captioning
- Generate Dense Captions: Scene graph dense captioning LLaVA
Level-4: Extra Contextual Insight:
- Generate Level-4 Additional Context: Extra Context

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GranD.md

GranD.md

GranD - Grounding Anything Dataset 🚀

Download GranD 📂

Preparing the Pretraining Annotations from GranD 🛠️

GranD Automated Annotation Pipeline

Files

GranD.md

Latest commit

History

GranD.md

File metadata and controls

GranD - Grounding Anything Dataset 🚀

Download GranD 📂

Preparing the Pretraining Annotations from GranD 🛠️

GranD Automated Annotation Pipeline