update dataset readme and tools

shenyunhang · Dec 8, 2023 · 0bca8f4 · 0bca8f4
1 parent 4f3381e
commit 0bca8f4
Show file tree

Hide file tree

Showing 43 changed files with 81 additions and 34 deletions.
diff --git a/datasets/README.md b/datasets/README.md
@@ -51,14 +51,14 @@ python3 datasets/prepare_coco_semantic_annos_from_panoptic_annos.py
 
 `lvis_v1_{train,val}+coco_mask.json` are generated by running
 ```
-python3 tools/lvis/merge_lvis_coco.py
+python3 datasets/tools/lvis/merge_lvis_coco.py
 ```
 
 
 `lvis_v1_{train,val}+coco_mask_cat_info.json` are generated by running
 ```
-python3 tools/lvis/add_category_info_frequence.py --json_path datasets/lvis/lvis_v1_train+coco_mask.json
-python3 tools/lvis/add_category_info_frequence.py --json_path datasets/lvis/lvis_v1_val+coco_mask.json
+python3 datasets/tools/lvis/add_category_info_frequence.py --json_path datasets/lvis/lvis_v1_train+coco_mask.json
+python3 datasets/tools/lvis/add_category_info_frequence.py --json_path datasets/lvis/lvis_v1_val+coco_mask.json
 ```
 
 
@@ -79,21 +79,21 @@ $DETECTRON2_DATASETS/
 
 `objects365_train_fixname.json` and `objects365_val_fixname.json` are generated by running
 ```bash
-python3 tools/objects3652coco/get_image_info.py --image_dir datasets/objects365/train/ --json_path datasets/objects365/annotations/zhiyuan_objv2_train.json --output_path datasets/objects365/annotations/image_info_train.txt
-python3 tools/objects3652coco/get_image_info.py --image_dir datasets/objects365/val/ --json_path datasets/objects365/annotations/zhiyuan_objv2_val.json --output_path datasets/objects365/annotations/image_info_val.txt
+python3 datasets/tools/objects3652coco/get_image_info.py --image_dir datasets/objects365/train/ --json_path datasets/objects365/annotations/zhiyuan_objv2_train.json --output_path datasets/objects365/annotations/image_info_train.txt
+python3 datasets/tools/objects3652coco/get_image_info.py --image_dir datasets/objects365/val/ --json_path datasets/objects365/annotations/zhiyuan_objv2_val.json --output_path datasets/objects365/annotations/image_info_val.txt
 
-python3 tools/objects3652coco/convert_annotations.py --root_dir datasets/objects365/ --image_info_path datasets/objects365/annotations/image_info_train.txt --subsets train --apply_exif
-python3 tools/objects3652coco/convert_annotations.py --root_dir datasets/objects365/ --image_info_path datasets/objects365/annotations/image_info_val.txt --subsets val --apply_exif
-python3 tools/objects3652coco/convert_annotations.py --root_dir datasets/objects365/ --image_info_path datasets/objects365/annotations/image_info_val.txt --subsets minival --apply_exif
+python3 datasets/tools/objects3652coco/convert_annotations.py --root_dir datasets/objects365/ --image_info_path datasets/objects365/annotations/image_info_train.txt --subsets train --apply_exif
+python3 datasets/tools/objects3652coco/convert_annotations.py --root_dir datasets/objects365/ --image_info_path datasets/objects365/annotations/image_info_val.txt --subsets val --apply_exif
+python3 datasets/tools/objects3652coco/convert_annotations.py --root_dir datasets/objects365/ --image_info_path datasets/objects365/annotations/image_info_val.txt --subsets minival --apply_exif
 
-python3 tools/objects3652coco/fix_o365_names.py --ann datasets/objects365/annotations/objects365_train.json
-python3 tools/objects3652coco/fix_o365_names.py --ann datasets/objects365/annotations/objects365_val.json
-python3 tools/objects3652coco/fix_o365_names.py --ann datasets/objects365/annotations/objects365_minival.json
+python3 datasets/tools/objects3652coco/fix_o365_names.py --ann datasets/objects365/annotations/objects365_train.json
+python3 datasets/tools/objects3652coco/fix_o365_names.py --ann datasets/objects365/annotations/objects365_val.json
+python3 datasets/tools/objects3652coco/fix_o365_names.py --ann datasets/objects365/annotations/objects365_minival.json
 ```
 
 As Objects365 is large, we generate annotation file for each image separetely
 ```
-python3 tools/generate_img_ann_pair.py --json_path datasets/objects365/annotations/objects365_train_fixname.json --image_root datasets/objects365/train/
+python3 datasets/tools/generate_img_ann_pair.py --json_path datasets/objects365/annotations/objects365_train_fixname.json --image_root datasets/objects365/train/
 ```
 
 ## Expected dataset structure for [OpenImages](https://storage.googleapis.com/openimages/web/download.html#download_manually):
@@ -111,27 +111,27 @@ $DETECTRON2_DATASETS/
 
 `openimages_v6_{train,val}_bbox.json` are generated by running
 ```
-python3 tools/openimages2coco/convert_annotations.py --path datasets/openimages/ --version v6 --subset train --task bbox --apply-exif
-python3 tools/openimages2coco/convert_annotations.py --path datasets/openimages/ --version v6 --subset val --task bbox --apply-exif
+python3 datasets/tools/openimages2coco/convert_annotations.py --path datasets/openimages/ --version v6 --subset train --task bbox --apply-exif
+python3 datasets/tools/openimages2coco/convert_annotations.py --path datasets/openimages/ --version v6 --subset val --task bbox --apply-exif
 ```
 
 `openimages_v6_{train,val}_bbox_nogroup.json` are generated by running
 ```
-python3 tools/openimages2coco/convert_annotations.py --path datasets/openimages/ --version v6 --subset train --task bbox --apply-exif --exclude-group
-python3 tools/openimages2coco/convert_annotations.py --path datasets/openimages/ --version v6 --subset val --task bbox --apply-exif --exclude-group
+python3 datasets/tools/openimages2coco/convert_annotations.py --path datasets/openimages/ --version v6 --subset train --task bbox --apply-exif --exclude-group
+python3 datasets/tools/openimages2coco/convert_annotations.py --path datasets/openimages/ --version v6 --subset val --task bbox --apply-exif --exclude-group
 ```
 
 `*_cat_info.json` are generated by running
 ```
-python3 tools/lvis/add_category_info_frequence.py --json_path datasets/openimages/annotations/openimages_v6_train_bbox.json
-python3 tools/lvis/add_category_info_frequence.py --json_path datasets/openimages/annotations/openimages_v6_val_bbox.json
-python3 tools/lvis/add_category_info_frequence.py --json_path datasets/openimages/annotations/openimages_v6_train_bbox_nogroup.json
-python3 tools/lvis/add_category_info_frequence.py --json_path datasets/openimages/annotations/openimages_v6_val_bbox_nogroup.json
+python3 datasets/tools/lvis/add_category_info_frequence.py --json_path datasets/openimages/annotations/openimages_v6_train_bbox.json
+python3 datasets/tools/lvis/add_category_info_frequence.py --json_path datasets/openimages/annotations/openimages_v6_val_bbox.json
+python3 datasets/tools/lvis/add_category_info_frequence.py --json_path datasets/openimages/annotations/openimages_v6_train_bbox_nogroup.json
+python3 datasets/tools/lvis/add_category_info_frequence.py --json_path datasets/openimages/annotations/openimages_v6_val_bbox_nogroup.json
 ```
 
 Finally, runing
 ```
-python3 tools/generate_img_ann_pair.py --json_path datasets/openimages/annotations/openimages_v6_train_bbox.json --image_root datasets/openimages/train/
+python3 datasets/tools/generate_img_ann_pair.py --json_path datasets/openimages/annotations/openimages_v6_train_bbox.json --image_root datasets/openimages/train/
 ```
 
 
@@ -153,9 +153,9 @@ $DETECTRON2_DATASETS/
 
 `visualgenome_*.json` are generated by running
 ```
-python3 tools/visualgenome2coco/convert_annotations_object.py -p datasets/visualgenome/ --apply-exif --object_list "" --num_objects 99999999 --min_box_area_frac 0.0
+python3 datasets/tools/visualgenome2coco/convert_annotations_object.py -p datasets/visualgenome/ --apply-exif --object_list "" --num_objects 99999999 --min_box_area_frac 0.0
 
-python3 tools/visualgenome2coco/convert_annotations_region.py -p datasets/visualgenome/ --apply-exif --object_list "" --num_objects 99999999 --min_box_area_frac 0.0
+python3 datasets/tools/visualgenome2coco/convert_annotations_region.py -p datasets/visualgenome/ --apply-exif --object_list "" --num_objects 99999999 --min_box_area_frac 0.0
 ```
 
 
@@ -198,7 +198,7 @@ $DETECTRON2_DATASETS/
 
 `gqa_region*.json` are generated by running
 ```
-python3 tools/gqa2coco/convert.py --data_path datasets/gqa/ --img_path datasets/gqa/images --sg_path datasets/gqa/ --vg_img_data_path datasets/visualgenome/annotations/ --out_path datasets/gqa/
+python3 datasets/tools/gqa2coco/convert.py --data_path datasets/gqa/ --img_path datasets/gqa/images --sg_path datasets/gqa/ --vg_img_data_path datasets/visualgenome/annotations/ --out_path datasets/gqa/
 ```
 
 ## Expected dataset structure for [PhraseCut](https://github.com/ChenyunWu/PhraseCutDataset):
@@ -211,7 +211,7 @@ $DETECTRON2_DATASETS/
 
 `phrasecut_*.json` are generated by running
 ```
-python3 tools/phrasecut2coco/convert.py --data_path datasets/phrasecut/ --img_path datasets/phrasecut/images --out_path datasets/phrasecut/
+python3 datasets/tools/phrasecut2coco/convert.py --data_path datasets/phrasecut/ --img_path datasets/phrasecut/images --out_path datasets/phrasecut/
 ```
 
 
@@ -225,7 +225,7 @@ $DETECTRON2_DATASETS/
 
 `flickr30k_separateGT_*.json` are generated by running
 ```
-python3 tools/flickr2coco/convert.py --flickr_path datasets/flickr30k/flickr30k_entities/ --out_path datasets/flickr30k/
+python3 datasets/tools/flickr2coco/convert.py --flickr_path datasets/flickr30k/flickr30k_entities/ --out_path datasets/flickr30k/
 ```
 
 
@@ -241,7 +241,7 @@ $DETECTRON2_DATASETS/
 
 After download, update json files by runing
 ```
-python3 tools/odinw/convert.py
+python3 datasets/tools/odinw/convert.py
 ```
 
 This is because
@@ -292,36 +292,83 @@ $DETECTRON2_DATASETS/
  bdd100k/
  images/
  labels/
+ pan_seg/
+ coco_pano/
+ meta/
+ ...
+ ...
  seg/
 ```
 
-## Expected dataset structure for [PC459](https://cs.stanford.edu/~roozbeh/pascal-context/):
+`coco_pano` and `meta` is generated by running
+```
+wget https://github.com/shenyunhang/APE/releases/download/0/bdd_generated.tar.gz
+tar xvzf bdd_generated.tar.gz
+```
+
+
+
+
+## Expected dataset structure for [PC459 and PC59](https://cs.stanford.edu/~roozbeh/pascal-context/):
 ```
 $DETECTRON2_DATASETS/
  VOCdevkit/
  VOC2010/
+ Annotations/
+ ImageSets/
+ JPEGImages/
+ SegmentationClass/
+ SegmentationObject/
+ # below are from https://www.cs.stanford.edu/~roozbeh/pascal-context/trainval.tar.gz
+ trainval/
+ labels.txt
+ 59_labels.txt # https://www.cs.stanford.edu/~roozbeh/pascal-context/59_labels.txt
+ pascalcontext_val.txt # https://drive.google.com/file/d/1BCbiOKtLvozjVnlTJX51koIveUZHCcUh/view?usp=sharing
+ # below are generated
  annotations_detectron2/
  pc459_val/
+ pc59_val
 ```
 
-## Expected dataset structure for [PC59](https://cs.stanford.edu/~roozbeh/pascal-context/):
+It starts with a tar file `VOCtrainval_03-May-2010.tar`. You may want to download the 5K validation set [here](https://drive.google.com/file/d/1BCbiOKtLvozjVnlTJX51koIveUZHCcUh/view?usp=sharing).
+
+The directory `annotations_detectron2` is generated by running 
 ```
-$DETECTRON2_DATASETS/
- VOCdevkit/
- VOC2010/
- annotations_detectron2/
- pc59_val/
+python datasets/prepare_pascal_context.py
 ```
 
+
+
 ## Expected dataset structure for [VOC](http:https://host.robots.ox.ac.uk/pascal/VOC/voc2012/):
 ```
 $DETECTRON2_DATASETS/
  VOCdevkit/
  VOC2012/
+ Annotations/
+ ImageSets/
+ JPEGImages/
+ SegmentationClass/
+ SegmentationObject/
+ SegmentationClassAug/ # https://github.com/kazuto1011/deeplab-pytorch/blob/master/data/datasets/voc12/README.md
+ # below are generated
+ images_detectron2/
  annotations_detectron2/
  val/
 ```
 
+It starts with a tar file `VOCtrainval_11-May-2012.tar`.
+
+We use SBD augmentated training data as `SegmentationClassAug` following [Deeplab](https://github.com/kazuto1011/deeplab-pytorch/blob/master/data/datasets/voc12/README.md)
+
+The directories `images_detectron2` and `annotations_detectron2` are generated by running 
+```
+python datasets/prepare_voc_sem_seg.py
+```
+
+
+
+
+
 ## Expected dataset structure for [D3](https://github.com/shikras/d-cube#download):
 ```
 $DETECTRON2_DATASETS/