Skip to content

Commit

Permalink
update dataset readme and tools
Browse files Browse the repository at this point in the history
  • Loading branch information
shenyunhang committed Dec 8, 2023
1 parent 4f3381e commit 0bca8f4
Show file tree
Hide file tree
Showing 43 changed files with 81 additions and 34 deletions.
115 changes: 81 additions & 34 deletions datasets/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,14 +51,14 @@ python3 datasets/prepare_coco_semantic_annos_from_panoptic_annos.py

`lvis_v1_{train,val}+coco_mask.json` are generated by running
```
python3 tools/lvis/merge_lvis_coco.py
python3 datasets/tools/lvis/merge_lvis_coco.py
```


`lvis_v1_{train,val}+coco_mask_cat_info.json` are generated by running
```
python3 tools/lvis/add_category_info_frequence.py --json_path datasets/lvis/lvis_v1_train+coco_mask.json
python3 tools/lvis/add_category_info_frequence.py --json_path datasets/lvis/lvis_v1_val+coco_mask.json
python3 datasets/tools/lvis/add_category_info_frequence.py --json_path datasets/lvis/lvis_v1_train+coco_mask.json
python3 datasets/tools/lvis/add_category_info_frequence.py --json_path datasets/lvis/lvis_v1_val+coco_mask.json
```


Expand All @@ -79,21 +79,21 @@ $DETECTRON2_DATASETS/

`objects365_train_fixname.json` and `objects365_val_fixname.json` are generated by running
```bash
python3 tools/objects3652coco/get_image_info.py --image_dir datasets/objects365/train/ --json_path datasets/objects365/annotations/zhiyuan_objv2_train.json --output_path datasets/objects365/annotations/image_info_train.txt
python3 tools/objects3652coco/get_image_info.py --image_dir datasets/objects365/val/ --json_path datasets/objects365/annotations/zhiyuan_objv2_val.json --output_path datasets/objects365/annotations/image_info_val.txt
python3 datasets/tools/objects3652coco/get_image_info.py --image_dir datasets/objects365/train/ --json_path datasets/objects365/annotations/zhiyuan_objv2_train.json --output_path datasets/objects365/annotations/image_info_train.txt
python3 datasets/tools/objects3652coco/get_image_info.py --image_dir datasets/objects365/val/ --json_path datasets/objects365/annotations/zhiyuan_objv2_val.json --output_path datasets/objects365/annotations/image_info_val.txt

python3 tools/objects3652coco/convert_annotations.py --root_dir datasets/objects365/ --image_info_path datasets/objects365/annotations/image_info_train.txt --subsets train --apply_exif
python3 tools/objects3652coco/convert_annotations.py --root_dir datasets/objects365/ --image_info_path datasets/objects365/annotations/image_info_val.txt --subsets val --apply_exif
python3 tools/objects3652coco/convert_annotations.py --root_dir datasets/objects365/ --image_info_path datasets/objects365/annotations/image_info_val.txt --subsets minival --apply_exif
python3 datasets/tools/objects3652coco/convert_annotations.py --root_dir datasets/objects365/ --image_info_path datasets/objects365/annotations/image_info_train.txt --subsets train --apply_exif
python3 datasets/tools/objects3652coco/convert_annotations.py --root_dir datasets/objects365/ --image_info_path datasets/objects365/annotations/image_info_val.txt --subsets val --apply_exif
python3 datasets/tools/objects3652coco/convert_annotations.py --root_dir datasets/objects365/ --image_info_path datasets/objects365/annotations/image_info_val.txt --subsets minival --apply_exif

python3 tools/objects3652coco/fix_o365_names.py --ann datasets/objects365/annotations/objects365_train.json
python3 tools/objects3652coco/fix_o365_names.py --ann datasets/objects365/annotations/objects365_val.json
python3 tools/objects3652coco/fix_o365_names.py --ann datasets/objects365/annotations/objects365_minival.json
python3 datasets/tools/objects3652coco/fix_o365_names.py --ann datasets/objects365/annotations/objects365_train.json
python3 datasets/tools/objects3652coco/fix_o365_names.py --ann datasets/objects365/annotations/objects365_val.json
python3 datasets/tools/objects3652coco/fix_o365_names.py --ann datasets/objects365/annotations/objects365_minival.json
```

As Objects365 is large, we generate annotation file for each image separetely
```
python3 tools/generate_img_ann_pair.py --json_path datasets/objects365/annotations/objects365_train_fixname.json --image_root datasets/objects365/train/
python3 datasets/tools/generate_img_ann_pair.py --json_path datasets/objects365/annotations/objects365_train_fixname.json --image_root datasets/objects365/train/
```

## Expected dataset structure for [OpenImages](https://storage.googleapis.com/openimages/web/download.html#download_manually):
Expand All @@ -111,27 +111,27 @@ $DETECTRON2_DATASETS/

`openimages_v6_{train,val}_bbox.json` are generated by running
```
python3 tools/openimages2coco/convert_annotations.py --path datasets/openimages/ --version v6 --subset train --task bbox --apply-exif
python3 tools/openimages2coco/convert_annotations.py --path datasets/openimages/ --version v6 --subset val --task bbox --apply-exif
python3 datasets/tools/openimages2coco/convert_annotations.py --path datasets/openimages/ --version v6 --subset train --task bbox --apply-exif
python3 datasets/tools/openimages2coco/convert_annotations.py --path datasets/openimages/ --version v6 --subset val --task bbox --apply-exif
```

`openimages_v6_{train,val}_bbox_nogroup.json` are generated by running
```
python3 tools/openimages2coco/convert_annotations.py --path datasets/openimages/ --version v6 --subset train --task bbox --apply-exif --exclude-group
python3 tools/openimages2coco/convert_annotations.py --path datasets/openimages/ --version v6 --subset val --task bbox --apply-exif --exclude-group
python3 datasets/tools/openimages2coco/convert_annotations.py --path datasets/openimages/ --version v6 --subset train --task bbox --apply-exif --exclude-group
python3 datasets/tools/openimages2coco/convert_annotations.py --path datasets/openimages/ --version v6 --subset val --task bbox --apply-exif --exclude-group
```

`*_cat_info.json` are generated by running
```
python3 tools/lvis/add_category_info_frequence.py --json_path datasets/openimages/annotations/openimages_v6_train_bbox.json
python3 tools/lvis/add_category_info_frequence.py --json_path datasets/openimages/annotations/openimages_v6_val_bbox.json
python3 tools/lvis/add_category_info_frequence.py --json_path datasets/openimages/annotations/openimages_v6_train_bbox_nogroup.json
python3 tools/lvis/add_category_info_frequence.py --json_path datasets/openimages/annotations/openimages_v6_val_bbox_nogroup.json
python3 datasets/tools/lvis/add_category_info_frequence.py --json_path datasets/openimages/annotations/openimages_v6_train_bbox.json
python3 datasets/tools/lvis/add_category_info_frequence.py --json_path datasets/openimages/annotations/openimages_v6_val_bbox.json
python3 datasets/tools/lvis/add_category_info_frequence.py --json_path datasets/openimages/annotations/openimages_v6_train_bbox_nogroup.json
python3 datasets/tools/lvis/add_category_info_frequence.py --json_path datasets/openimages/annotations/openimages_v6_val_bbox_nogroup.json
```

Finally, runing
```
python3 tools/generate_img_ann_pair.py --json_path datasets/openimages/annotations/openimages_v6_train_bbox.json --image_root datasets/openimages/train/
python3 datasets/tools/generate_img_ann_pair.py --json_path datasets/openimages/annotations/openimages_v6_train_bbox.json --image_root datasets/openimages/train/
```


Expand All @@ -153,9 +153,9 @@ $DETECTRON2_DATASETS/

`visualgenome_*.json` are generated by running
```
python3 tools/visualgenome2coco/convert_annotations_object.py -p datasets/visualgenome/ --apply-exif --object_list "" --num_objects 99999999 --min_box_area_frac 0.0
python3 datasets/tools/visualgenome2coco/convert_annotations_object.py -p datasets/visualgenome/ --apply-exif --object_list "" --num_objects 99999999 --min_box_area_frac 0.0
python3 tools/visualgenome2coco/convert_annotations_region.py -p datasets/visualgenome/ --apply-exif --object_list "" --num_objects 99999999 --min_box_area_frac 0.0
python3 datasets/tools/visualgenome2coco/convert_annotations_region.py -p datasets/visualgenome/ --apply-exif --object_list "" --num_objects 99999999 --min_box_area_frac 0.0
```


Expand Down Expand Up @@ -198,7 +198,7 @@ $DETECTRON2_DATASETS/

`gqa_region*.json` are generated by running
```
python3 tools/gqa2coco/convert.py --data_path datasets/gqa/ --img_path datasets/gqa/images --sg_path datasets/gqa/ --vg_img_data_path datasets/visualgenome/annotations/ --out_path datasets/gqa/
python3 datasets/tools/gqa2coco/convert.py --data_path datasets/gqa/ --img_path datasets/gqa/images --sg_path datasets/gqa/ --vg_img_data_path datasets/visualgenome/annotations/ --out_path datasets/gqa/
```

## Expected dataset structure for [PhraseCut](https://github.com/ChenyunWu/PhraseCutDataset):
Expand All @@ -211,7 +211,7 @@ $DETECTRON2_DATASETS/

`phrasecut_*.json` are generated by running
```
python3 tools/phrasecut2coco/convert.py --data_path datasets/phrasecut/ --img_path datasets/phrasecut/images --out_path datasets/phrasecut/
python3 datasets/tools/phrasecut2coco/convert.py --data_path datasets/phrasecut/ --img_path datasets/phrasecut/images --out_path datasets/phrasecut/
```


Expand All @@ -225,7 +225,7 @@ $DETECTRON2_DATASETS/

`flickr30k_separateGT_*.json` are generated by running
```
python3 tools/flickr2coco/convert.py --flickr_path datasets/flickr30k/flickr30k_entities/ --out_path datasets/flickr30k/
python3 datasets/tools/flickr2coco/convert.py --flickr_path datasets/flickr30k/flickr30k_entities/ --out_path datasets/flickr30k/
```


Expand All @@ -241,7 +241,7 @@ $DETECTRON2_DATASETS/

After download, update json files by runing
```
python3 tools/odinw/convert.py
python3 datasets/tools/odinw/convert.py
```

This is because
Expand Down Expand Up @@ -292,36 +292,83 @@ $DETECTRON2_DATASETS/
bdd100k/
images/
labels/
pan_seg/
coco_pano/
meta/
...
...
seg/
```

## Expected dataset structure for [PC459](https://cs.stanford.edu/~roozbeh/pascal-context/):
`coco_pano` and `meta` is generated by running
```
wget https://github.com/shenyunhang/APE/releases/download/0/bdd_generated.tar.gz
tar xvzf bdd_generated.tar.gz
```




## Expected dataset structure for [PC459 and PC59](https://cs.stanford.edu/~roozbeh/pascal-context/):
```
$DETECTRON2_DATASETS/
VOCdevkit/
VOC2010/
Annotations/
ImageSets/
JPEGImages/
SegmentationClass/
SegmentationObject/
# below are from https://www.cs.stanford.edu/~roozbeh/pascal-context/trainval.tar.gz
trainval/
labels.txt
59_labels.txt # https://www.cs.stanford.edu/~roozbeh/pascal-context/59_labels.txt
pascalcontext_val.txt # https://drive.google.com/file/d/1BCbiOKtLvozjVnlTJX51koIveUZHCcUh/view?usp=sharing
# below are generated
annotations_detectron2/
pc459_val/
pc59_val
```

## Expected dataset structure for [PC59](https://cs.stanford.edu/~roozbeh/pascal-context/):
It starts with a tar file `VOCtrainval_03-May-2010.tar`. You may want to download the 5K validation set [here](https://drive.google.com/file/d/1BCbiOKtLvozjVnlTJX51koIveUZHCcUh/view?usp=sharing).

The directory `annotations_detectron2` is generated by running
```
$DETECTRON2_DATASETS/
VOCdevkit/
VOC2010/
annotations_detectron2/
pc59_val/
python datasets/prepare_pascal_context.py
```



## Expected dataset structure for [VOC](http:https://host.robots.ox.ac.uk/pascal/VOC/voc2012/):
```
$DETECTRON2_DATASETS/
VOCdevkit/
VOC2012/
Annotations/
ImageSets/
JPEGImages/
SegmentationClass/
SegmentationObject/
SegmentationClassAug/ # https://github.com/kazuto1011/deeplab-pytorch/blob/master/data/datasets/voc12/README.md
# below are generated
images_detectron2/
annotations_detectron2/
val/
```

It starts with a tar file `VOCtrainval_11-May-2012.tar`.

We use SBD augmentated training data as `SegmentationClassAug` following [Deeplab](https://github.com/kazuto1011/deeplab-pytorch/blob/master/data/datasets/voc12/README.md)

The directories `images_detectron2` and `annotations_detectron2` are generated by running
```
python datasets/prepare_voc_sem_seg.py
```





## Expected dataset structure for [D3](https://github.com/shikras/d-cube#download):
```
$DETECTRON2_DATASETS/
Expand Down
Loading

0 comments on commit 0bca8f4

Please sign in to comment.