diff --git a/README.md b/README.md index 852b6d7..b8d2543 100644 --- a/README.md +++ b/README.md @@ -17,6 +17,7 @@ This repo contains the code for our paper **SeMask: Semantically Masked Transfor semask ## Contents + 1. [Results](#1-results) 2. [Setup Instructions](#2-setup-instructions) 3. [Citing SeMask](#3-citing-semask) @@ -27,18 +28,16 @@ This repo contains the code for our paper **SeMask: Semantically Masked Transfor ### ADE20K - - | Method | Backbone | Crop Size | mIoU | mIoU (ms+flip) | #params | config | Checkpoint | | :---:| :---: | :---: | :---:| :---: | :---: | :---: | :---: | -| SeMask-T FPN | SeMask Swin-T | 512x512 | 42.11 | 43.16 | 35M | [config](configs/semask_swin/ade20k/semfpn_semask_swin_tiny_patch4_window7_512x512_80k_ade20k.py) | TBD | +| SeMask-T FPN | SeMask Swin-T | 512x512 | 42.06 | 43.36 | 35M | [config](configs/semask_swin/ade20k/semfpn_semask_swin_tiny_patch4_window7_512x512_80k_ade20k.py) | [checkpoint](https://drive.google.com/file/d/1L0daUHWQGNGCXHF-cKWEauPSyBV0GLOR/view?usp=sharing) | | SeMask-S FPN | SeMask Swin-S | 512x512 | 45.92 | 47.63 | 56M | [config](SeMask-FPN/configs/semask_swin/ade20k/semfpn_semask_swin_small_patch4_window7_512x512_80k_ade20k.py) | [checkpoint](https://drive.google.com/file/d/1QhDG4SyGFtWL5kP9BbBoyPqTuFu7fH_y/view?usp=sharing) | | SeMask-B FPN | SeMask Swin-B | 512x512 | 49.35 | 50.98 | 96M | [config](SeMask-FPN/configs/semask_swin/ade20k/semfpn_semask_swin_base_patch4_window12_512x512_80k_ade20k.py) | [checkpoint](https://drive.google.com/file/d/1PXCEhrrUy5TJC4dUp7YDQvaapnMzGT6C/view?usp=sharing) | | SeMask-L FPN | SeMask Swin-L | 640x640 | 51.89 | 53.52 | 211M| [config](SeMask-FPN/configs/semask_swin/ade20k/semfpn_semask_swin_large_patch4_window12_640x640_80k_ade20k.py) | [checkpoint](https://drive.google.com/file/d/1u5flfAQCiQJbMZbZPIlGUGTYBz9Ca7rE/view?usp=sharing) | | SeMask-L MaskFormer | SeMask Swin-L | 640x640 | 54.75 | 56.15 | 219M | [config](SeMask-MaskFormer/configs/ade20k-150/semask_swin/maskformer_semask_swin_large_IN21k_384_bs16_160k_res640.yaml) | [checkpoint](https://drive.google.com/file/d/1KgKQLGv9CcBqeEvOEDdxQ-O6lpMfHBLw/view?usp=sharing) | | SeMask-L Mask2Former | SeMask Swin-L | 640x640 | 56.41 | 57.52 | 222M | [config](SeMask-Mask2Former/configs/ade20k/semantic-segmentation/semask_swin/maskformer2_semask_swin_large_IN21k_384_bs16_160k_res640.yaml) | [checkpoint](https://drive.google.com/file/d/1hN1I4Wv7_1FCPOsfA-5PELn6Xn3b_R8a/view?usp=sharing) | -| SeMask-L Mask2Former FAPN | SeMask Swin-L | 640x640 | **56.68** | 58.00 | 227M | [config](SeMask-FAPN/SeMask-Mask2Former/configs/ade20k/semantic-segmentation/semask_swin/fapn_maskformer2_semask_swin_large_IN21k_384_bs16_160k_res640.yaml) | TBD | -| SeMask-L Mask2Former MSFAPN | SeMask Swin-L | 640x640 | 56.54 | **58.22** | 224M | [config](SeMask-FAPN/SeMask-Mask2Former/configs/ade20k/semantic-segmentation/semask_swin/msfapn_maskformer2_semask_swin_large_IN21k_384_bs16_160k_res640.yaml) | [checkpoint](https://drive.google.com/file/d/1w-DRGufIv3zpDO7rJFv2z5WeLx0pDTJe/view?usp=sharing) | +| SeMask-L Mask2Former MSFaPN | SeMask Swin-L | 640x640 | 56.54 | 58.22 | 224M | [config](SeMask-FAPN/SeMask-Mask2Former/configs/ade20k/semantic-segmentation/semask_swin/msfapn_maskformer2_semask_swin_large_IN21k_384_bs16_160k_res640.yaml) | [checkpoint](https://drive.google.com/file/d/1w-DRGufIv3zpDO7rJFv2z5WeLx0pDTJe/view?usp=sharing) | +| SeMask-L Mask2Former FaPN | SeMask Swin-L | 640x640 | **56.97** | **58.22** | 227M | [config](SeMask-FAPN/SeMask-Mask2Former/configs/ade20k/semantic-segmentation/semask_swin/fapn_maskformer2_semask_swin_large_IN21k_384_bs16_160k_res640.yaml) | [checkpoint](https://drive.google.com/file/d/1DQ9KltSLDj47H2jYnCtVwyBf7KPR9SM_/view?usp=sharing) | ### Cityscapes @@ -68,7 +67,7 @@ We provide the codebase with SeMask incorporated into various models. Please che - SeMask-FPN: [Setup Instructions](SeMask-FPN/README.md#2-setup-instructions) - SeMask-MaskFormer: [Setup Instructions](SeMask-MaskFormer/README.md#2-setup-instructions) - SeMask-Mask2Former: [Setup Instructions](SeMask-Mask2Former/README.md#2-setup-instructions) -- SeMask-FAPN: [Setup Instructions](SeMask-FAPN/README.md#2-setup-instructions) +- SeMask-FaPN: [Setup Instructions](SeMask-FAPN/README.md#2-setup-instructions) ## 3. Citing SeMask diff --git a/SeMask-FAPN/README.md b/SeMask-FAPN/README.md index 587344e..2918d91 100644 --- a/SeMask-FAPN/README.md +++ b/SeMask-FAPN/README.md @@ -16,15 +16,15 @@ This repo contains the code for our paper **SeMask: Semantically Masked Transfor | Method | Backbone | Crop Size | mIoU | mIoU (ms+flip) | #params | config | Checkpoint | | :---:| :---: | :---: | :---:| :---: | :---: | :---: | :---: | -| SeMask-L Mask2Former FAPN | SeMask Swin-L | 640x640 | **56.68** | 58.00 | 227M | [config](SeMask-Mask2Former/configs/ade20k/semantic-segmentation/semask_swin/fapn_maskformer2_semask_swin_large_IN21k_384_bs16_160k_res640.yaml) | TBD | -| SeMask-L Mask2Former MSFAPN | SeMask Swin-L | 640x640 | 56.54 | **58.22** | 224M | [config](SeMask-Mask2Former/configs/ade20k/semantic-segmentation/semask_swin/msfapn_maskformer2_semask_swin_large_IN21k_384_bs16_160k_res640.yaml) | [checkpoint](https://drive.google.com/file/d/1w-DRGufIv3zpDO7rJFv2z5WeLx0pDTJe/view?usp=sharing) | +| SeMask-L Mask2Former MSFaPN | SeMask Swin-L | 640x640 | 56.54 | 58.22 | 224M | [config](SeMask-Mask2Former/configs/ade20k/semantic-segmentation/semask_swin/msfapn_maskformer2_semask_swin_large_IN21k_384_bs16_160k_res640.yaml) | [checkpoint](https://drive.google.com/file/d/1w-DRGufIv3zpDO7rJFv2z5WeLx0pDTJe/view?usp=sharing) | +| SeMask-L Mask2Former FaPN | SeMask Swin-L | 640x640 | **56.97** | **58.22** | 227M | [config](SeMask-Mask2Former/configs/ade20k/semantic-segmentation/semask_swin/fapn_maskformer2_semask_swin_large_IN21k_384_bs16_160k_res640.yaml) | [checkpoint](https://drive.google.com/file/d/1DQ9KltSLDj47H2jYnCtVwyBf7KPR9SM_/view?usp=sharing) | ## 2. Setup Instructions ### Installation -- [DCNv2](DCNv2) code is compatible with [Pytorch v1.7.1](https://pytorch.org/get-started/locally/). +- Build the [DCNv2](DCNv2) module which is compatible with [Pytorch v1.7.1](https://pytorch.org/get-started/locally/). - Follow the installation instructions for [Mask2Former](SeMask-Mask2Former/INSTALL.md). diff --git a/SeMask-FAPN/SeMask-Mask2Former/GETTING_STARTED.md b/SeMask-FAPN/SeMask-Mask2Former/GETTING_STARTED.md index 6bb096c..9c25405 100644 --- a/SeMask-FAPN/SeMask-Mask2Former/GETTING_STARTED.md +++ b/SeMask-FAPN/SeMask-Mask2Former/GETTING_STARTED.md @@ -9,11 +9,11 @@ Please see [Getting Started with Detectron2](https://github.com/facebookresearch 1. Pick a model and its config file from [model zoo](MODEL_ZOO.md), - for example, `configs/coco/panoptic-segmentation/maskformer2_R50_bs16_50ep.yaml`. + for example, `configs/ade20k/semantic-segmentation/semask_swin/fapn_maskformer2_semask_swin_large_IN21k_384_bs16_160k_res640.yaml`. 2. We provide `demo.py` that is able to demo builtin configs. Run it with: ``` cd demo/ -python demo.py --config-file ../configs/coco/panoptic-segmentation/maskformer2_R50_bs16_50ep.yaml \ +python demo.py --config-file ../configs/ade20k/semantic-segmentation/semask_swin/fapn_maskformer2_semask_swin_large_IN21k_384_bs16_160k_res640.yaml \ --input input1.jpg input2.jpg \ [--other-options] --opts MODEL.WEIGHTS /path/to/checkpoint_file @@ -39,7 +39,7 @@ setup the corresponding datasets following then run: ``` python train_net.py --num-gpus 8 \ - --config-file configs/coco/panoptic-segmentation/maskformer2_R50_bs16_50ep.yaml + --config-file configs/ade20k/semantic-segmentation/semask_swin/fapn_maskformer2_semask_swin_large_IN21k_384_bs16_160k_res640.yaml ``` The configs are made for 8-GPU training. @@ -47,14 +47,14 @@ Since we use ADAMW optimizer, it is not clear how to scale learning rate with ba To train on 1 GPU, you need to figure out learning rate and batch size by yourself: ``` python train_net.py \ - --config-file configs/coco/panoptic-segmentation/maskformer2_R50_bs16_50ep.yaml \ + --config-file configs/ade20k/semantic-segmentation/semask_swin/fapn_maskformer2_semask_swin_large_IN21k_384_bs16_160k_res640.yaml \ --num-gpus 1 SOLVER.IMS_PER_BATCH SET_TO_SOME_REASONABLE_VALUE SOLVER.BASE_LR SET_TO_SOME_REASONABLE_VALUE ``` To evaluate a model's performance, use ``` python train_net.py \ - --config-file configs/coco/panoptic-segmentation/maskformer2_R50_bs16_50ep.yaml \ + --config-file configs/ade20k/semantic-segmentation/semask_swin/fapn_maskformer2_semask_swin_large_IN21k_384_bs16_160k_res640.yaml \ --eval-only MODEL.WEIGHTS /path/to/checkpoint_file ``` For more options, see `python train_net.py -h`. diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/ade20k/instance-segmentation/Base-ADE20K-InstanceSegmentation.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/ade20k/instance-segmentation/Base-ADE20K-InstanceSegmentation.yaml deleted file mode 100644 index 50a1c13..0000000 --- a/SeMask-FAPN/SeMask-Mask2Former/configs/ade20k/instance-segmentation/Base-ADE20K-InstanceSegmentation.yaml +++ /dev/null @@ -1,61 +0,0 @@ -MODEL: - BACKBONE: - FREEZE_AT: 0 - NAME: "build_resnet_backbone" - WEIGHTS: "detectron2://ImageNetPretrained/torchvision/R-50.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] - RESNETS: - DEPTH: 50 - STEM_TYPE: "basic" # not used - STEM_OUT_CHANNELS: 64 - STRIDE_IN_1X1: False - OUT_FEATURES: ["res2", "res3", "res4", "res5"] - # NORM: "SyncBN" - RES5_MULTI_GRID: [1, 1, 1] # not used -DATASETS: - TRAIN: ("ade20k_instance_train",) - TEST: ("ade20k_instance_val",) -SOLVER: - IMS_PER_BATCH: 16 - BASE_LR: 0.0001 - MAX_ITER: 160000 - WARMUP_FACTOR: 1.0 - WARMUP_ITERS: 0 - WEIGHT_DECAY: 0.05 - OPTIMIZER: "ADAMW" - LR_SCHEDULER_NAME: "WarmupPolyLR" - BACKBONE_MULTIPLIER: 0.1 - CLIP_GRADIENTS: - ENABLED: True - CLIP_TYPE: "full_model" - CLIP_VALUE: 0.01 - NORM_TYPE: 2.0 - AMP: - ENABLED: True -INPUT: - MIN_SIZE_TRAIN: !!python/object/apply:eval ["[int(x * 0.1 * 640) for x in range(5, 21)]"] - MIN_SIZE_TRAIN_SAMPLING: "choice" - MIN_SIZE_TEST: 640 - MAX_SIZE_TRAIN: 2560 - MAX_SIZE_TEST: 2560 - CROP: - ENABLED: True - TYPE: "absolute" - SIZE: (640, 640) - SINGLE_CATEGORY_MAX_AREA: 1.0 - COLOR_AUG_SSD: True - SIZE_DIVISIBILITY: 640 # used in dataset mapper - FORMAT: "RGB" - DATASET_MAPPER_NAME: "mask_former_instance" -TEST: - EVAL_PERIOD: 5000 - AUG: - ENABLED: False - MIN_SIZES: [320, 480, 640, 800, 960, 1120] - MAX_SIZE: 4480 - FLIP: True -DATALOADER: - FILTER_EMPTY_ANNOTATIONS: True - NUM_WORKERS: 4 -VERSION: 2 diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/ade20k/instance-segmentation/maskformer2_R50_bs16_160k.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/ade20k/instance-segmentation/maskformer2_R50_bs16_160k.yaml deleted file mode 100644 index e37bcfb..0000000 --- a/SeMask-FAPN/SeMask-Mask2Former/configs/ade20k/instance-segmentation/maskformer2_R50_bs16_160k.yaml +++ /dev/null @@ -1,44 +0,0 @@ -_BASE_: Base-ADE20K-InstanceSegmentation.yaml -MODEL: - META_ARCHITECTURE: "MaskFormer" - SEM_SEG_HEAD: - NAME: "MaskFormerHead" - IGNORE_VALUE: 255 - NUM_CLASSES: 100 - LOSS_WEIGHT: 1.0 - CONVS_DIM: 256 - MASK_DIM: 256 - NORM: "GN" - # pixel decoder - PIXEL_DECODER_NAME: "MSDeformAttnPixelDecoder" - IN_FEATURES: ["res2", "res3", "res4", "res5"] - DEFORMABLE_TRANSFORMER_ENCODER_IN_FEATURES: ["res3", "res4", "res5"] - COMMON_STRIDE: 4 - TRANSFORMER_ENC_LAYERS: 6 - MASK_FORMER: - TRANSFORMER_DECODER_NAME: "MultiScaleMaskedTransformerDecoder" - TRANSFORMER_IN_FEATURE: "multi_scale_pixel_decoder" - DEEP_SUPERVISION: True - NO_OBJECT_WEIGHT: 0.1 - CLASS_WEIGHT: 2.0 - MASK_WEIGHT: 5.0 - DICE_WEIGHT: 5.0 - HIDDEN_DIM: 256 - NUM_OBJECT_QUERIES: 100 - NHEADS: 8 - DROPOUT: 0.0 - DIM_FEEDFORWARD: 2048 - ENC_LAYERS: 0 - PRE_NORM: False - ENFORCE_INPUT_PROJ: False - SIZE_DIVISIBILITY: 32 - DEC_LAYERS: 10 # 9 decoder layers, add one for the loss on learnable query - TRAIN_NUM_POINTS: 12544 - OVERSAMPLE_RATIO: 3.0 - IMPORTANCE_SAMPLE_RATIO: 0.75 - TEST: - SEMANTIC_ON: True - INSTANCE_ON: True - PANOPTIC_ON: True - OVERLAP_THRESHOLD: 0.8 - OBJECT_MASK_THRESHOLD: 0.8 diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/ade20k/instance-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_160k.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/ade20k/instance-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_160k.yaml deleted file mode 100644 index af03d4d..0000000 --- a/SeMask-FAPN/SeMask-Mask2Former/configs/ade20k/instance-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_160k.yaml +++ /dev/null @@ -1,18 +0,0 @@ -_BASE_: ../maskformer2_R50_bs16_160k.yaml -MODEL: - BACKBONE: - NAME: "D2SwinTransformer" - SWIN: - EMBED_DIM: 192 - DEPTHS: [2, 2, 18, 2] - NUM_HEADS: [6, 12, 24, 48] - WINDOW_SIZE: 12 - APE: False - DROP_PATH_RATE: 0.3 - PATCH_NORM: True - PRETRAIN_IMG_SIZE: 384 - WEIGHTS: "swin_large_patch4_window12_384_22k.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] - MASK_FORMER: - NUM_OBJECT_QUERIES: 200 diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/ade20k/panoptic-segmentation/Base-ADE20K-PanopticSegmentation.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/ade20k/panoptic-segmentation/Base-ADE20K-PanopticSegmentation.yaml deleted file mode 100644 index 559be07..0000000 --- a/SeMask-FAPN/SeMask-Mask2Former/configs/ade20k/panoptic-segmentation/Base-ADE20K-PanopticSegmentation.yaml +++ /dev/null @@ -1,61 +0,0 @@ -MODEL: - BACKBONE: - FREEZE_AT: 0 - NAME: "build_resnet_backbone" - WEIGHTS: "detectron2://ImageNetPretrained/torchvision/R-50.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] - RESNETS: - DEPTH: 50 - STEM_TYPE: "basic" # not used - STEM_OUT_CHANNELS: 64 - STRIDE_IN_1X1: False - OUT_FEATURES: ["res2", "res3", "res4", "res5"] - # NORM: "SyncBN" - RES5_MULTI_GRID: [1, 1, 1] # not used -DATASETS: - TRAIN: ("ade20k_panoptic_train",) - TEST: ("ade20k_panoptic_val",) -SOLVER: - IMS_PER_BATCH: 16 - BASE_LR: 0.0001 - MAX_ITER: 160000 - WARMUP_FACTOR: 1.0 - WARMUP_ITERS: 0 - WEIGHT_DECAY: 0.05 - OPTIMIZER: "ADAMW" - LR_SCHEDULER_NAME: "WarmupPolyLR" - BACKBONE_MULTIPLIER: 0.1 - CLIP_GRADIENTS: - ENABLED: True - CLIP_TYPE: "full_model" - CLIP_VALUE: 0.01 - NORM_TYPE: 2.0 - AMP: - ENABLED: True -INPUT: - MIN_SIZE_TRAIN: !!python/object/apply:eval ["[int(x * 0.1 * 640) for x in range(5, 21)]"] - MIN_SIZE_TRAIN_SAMPLING: "choice" - MIN_SIZE_TEST: 640 - MAX_SIZE_TRAIN: 2560 - MAX_SIZE_TEST: 2560 - CROP: - ENABLED: True - TYPE: "absolute" - SIZE: (640, 640) - SINGLE_CATEGORY_MAX_AREA: 1.0 - COLOR_AUG_SSD: True - SIZE_DIVISIBILITY: 640 # used in dataset mapper - FORMAT: "RGB" - DATASET_MAPPER_NAME: "mask_former_panoptic" -TEST: - EVAL_PERIOD: 5000 - AUG: - ENABLED: False - MIN_SIZES: [320, 480, 640, 800, 960, 1120] - MAX_SIZE: 4480 - FLIP: True -DATALOADER: - FILTER_EMPTY_ANNOTATIONS: True - NUM_WORKERS: 4 -VERSION: 2 diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/ade20k/panoptic-segmentation/maskformer2_R50_bs16_160k.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/ade20k/panoptic-segmentation/maskformer2_R50_bs16_160k.yaml deleted file mode 100644 index 82c0828..0000000 --- a/SeMask-FAPN/SeMask-Mask2Former/configs/ade20k/panoptic-segmentation/maskformer2_R50_bs16_160k.yaml +++ /dev/null @@ -1,44 +0,0 @@ -_BASE_: Base-ADE20K-PanopticSegmentation.yaml -MODEL: - META_ARCHITECTURE: "MaskFormer" - SEM_SEG_HEAD: - NAME: "MaskFormerHead" - IGNORE_VALUE: 255 - NUM_CLASSES: 150 - LOSS_WEIGHT: 1.0 - CONVS_DIM: 256 - MASK_DIM: 256 - NORM: "GN" - # pixel decoder - PIXEL_DECODER_NAME: "MSDeformAttnPixelDecoder" - IN_FEATURES: ["res2", "res3", "res4", "res5"] - DEFORMABLE_TRANSFORMER_ENCODER_IN_FEATURES: ["res3", "res4", "res5"] - COMMON_STRIDE: 4 - TRANSFORMER_ENC_LAYERS: 6 - MASK_FORMER: - TRANSFORMER_DECODER_NAME: "MultiScaleMaskedTransformerDecoder" - TRANSFORMER_IN_FEATURE: "multi_scale_pixel_decoder" - DEEP_SUPERVISION: True - NO_OBJECT_WEIGHT: 0.1 - CLASS_WEIGHT: 2.0 - MASK_WEIGHT: 5.0 - DICE_WEIGHT: 5.0 - HIDDEN_DIM: 256 - NUM_OBJECT_QUERIES: 100 - NHEADS: 8 - DROPOUT: 0.0 - DIM_FEEDFORWARD: 2048 - ENC_LAYERS: 0 - PRE_NORM: False - ENFORCE_INPUT_PROJ: False - SIZE_DIVISIBILITY: 32 - DEC_LAYERS: 10 # 9 decoder layers, add one for the loss on learnable query - TRAIN_NUM_POINTS: 12544 - OVERSAMPLE_RATIO: 3.0 - IMPORTANCE_SAMPLE_RATIO: 0.75 - TEST: - SEMANTIC_ON: True - INSTANCE_ON: True - PANOPTIC_ON: True - OVERLAP_THRESHOLD: 0.8 - OBJECT_MASK_THRESHOLD: 0.8 diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/ade20k/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_160k.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/ade20k/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_160k.yaml deleted file mode 100644 index af03d4d..0000000 --- a/SeMask-FAPN/SeMask-Mask2Former/configs/ade20k/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_160k.yaml +++ /dev/null @@ -1,18 +0,0 @@ -_BASE_: ../maskformer2_R50_bs16_160k.yaml -MODEL: - BACKBONE: - NAME: "D2SwinTransformer" - SWIN: - EMBED_DIM: 192 - DEPTHS: [2, 2, 18, 2] - NUM_HEADS: [6, 12, 24, 48] - WINDOW_SIZE: 12 - APE: False - DROP_PATH_RATE: 0.3 - PATCH_NORM: True - PRETRAIN_IMG_SIZE: 384 - WEIGHTS: "swin_large_patch4_window12_384_22k.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] - MASK_FORMER: - NUM_OBJECT_QUERIES: 200 diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/Base-Cityscapes-InstanceSegmentation.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/Base-Cityscapes-InstanceSegmentation.yaml deleted file mode 100644 index 28833e7..0000000 --- a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/Base-Cityscapes-InstanceSegmentation.yaml +++ /dev/null @@ -1,61 +0,0 @@ -MODEL: - BACKBONE: - FREEZE_AT: 0 - NAME: "build_resnet_backbone" - WEIGHTS: "detectron2://ImageNetPretrained/torchvision/R-50.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] - RESNETS: - DEPTH: 50 - STEM_TYPE: "basic" # not used - STEM_OUT_CHANNELS: 64 - STRIDE_IN_1X1: False - OUT_FEATURES: ["res2", "res3", "res4", "res5"] - NORM: "SyncBN" # use syncbn for cityscapes dataset - RES5_MULTI_GRID: [1, 1, 1] # not used -DATASETS: - TRAIN: ("cityscapes_fine_instance_seg_train",) - TEST: ("cityscapes_fine_instance_seg_val",) -SOLVER: - IMS_PER_BATCH: 16 - BASE_LR: 0.0001 - MAX_ITER: 90000 - WARMUP_FACTOR: 1.0 - WARMUP_ITERS: 0 - WEIGHT_DECAY: 0.05 - OPTIMIZER: "ADAMW" - LR_SCHEDULER_NAME: "WarmupPolyLR" - BACKBONE_MULTIPLIER: 0.1 - CLIP_GRADIENTS: - ENABLED: True - CLIP_TYPE: "full_model" - CLIP_VALUE: 0.01 - NORM_TYPE: 2.0 - AMP: - ENABLED: True -INPUT: - MIN_SIZE_TRAIN: !!python/object/apply:eval ["[int(x * 0.1 * 1024) for x in range(5, 21)]"] - MIN_SIZE_TRAIN_SAMPLING: "choice" - MIN_SIZE_TEST: 1024 - MAX_SIZE_TRAIN: 4096 - MAX_SIZE_TEST: 2048 - CROP: - ENABLED: True - TYPE: "absolute" - SIZE: (512, 1024) - SINGLE_CATEGORY_MAX_AREA: 1.0 - COLOR_AUG_SSD: True - SIZE_DIVISIBILITY: -1 - FORMAT: "RGB" - DATASET_MAPPER_NAME: "mask_former_instance" -TEST: - EVAL_PERIOD: 5000 - AUG: - ENABLED: False - MIN_SIZES: [512, 768, 1024, 1280, 1536, 1792] - MAX_SIZE: 4096 - FLIP: True -DATALOADER: - FILTER_EMPTY_ANNOTATIONS: True - NUM_WORKERS: 4 -VERSION: 2 diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/maskformer2_R101_bs16_90k.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/maskformer2_R101_bs16_90k.yaml deleted file mode 100644 index 1eb38da..0000000 --- a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/maskformer2_R101_bs16_90k.yaml +++ /dev/null @@ -1,11 +0,0 @@ -_BASE_: maskformer2_R50_bs16_90k.yaml -MODEL: - WEIGHTS: "R-101.pkl" - RESNETS: - DEPTH: 101 - STEM_TYPE: "basic" # not used - STEM_OUT_CHANNELS: 64 - STRIDE_IN_1X1: False - OUT_FEATURES: ["res2", "res3", "res4", "res5"] - NORM: "SyncBN" - RES5_MULTI_GRID: [1, 1, 1] # not used diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/maskformer2_R50_bs16_90k.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/maskformer2_R50_bs16_90k.yaml deleted file mode 100644 index 16b215b..0000000 --- a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/maskformer2_R50_bs16_90k.yaml +++ /dev/null @@ -1,44 +0,0 @@ -_BASE_: Base-Cityscapes-InstanceSegmentation.yaml -MODEL: - META_ARCHITECTURE: "MaskFormer" - SEM_SEG_HEAD: - NAME: "MaskFormerHead" - IGNORE_VALUE: 255 - NUM_CLASSES: 8 - LOSS_WEIGHT: 1.0 - CONVS_DIM: 256 - MASK_DIM: 256 - NORM: "GN" - # pixel decoder - PIXEL_DECODER_NAME: "MSDeformAttnPixelDecoder" - IN_FEATURES: ["res2", "res3", "res4", "res5"] - DEFORMABLE_TRANSFORMER_ENCODER_IN_FEATURES: ["res3", "res4", "res5"] - COMMON_STRIDE: 4 - TRANSFORMER_ENC_LAYERS: 6 - MASK_FORMER: - TRANSFORMER_DECODER_NAME: "MultiScaleMaskedTransformerDecoder" - TRANSFORMER_IN_FEATURE: "multi_scale_pixel_decoder" - DEEP_SUPERVISION: True - NO_OBJECT_WEIGHT: 0.1 - CLASS_WEIGHT: 2.0 - MASK_WEIGHT: 5.0 - DICE_WEIGHT: 5.0 - HIDDEN_DIM: 256 - NUM_OBJECT_QUERIES: 100 - NHEADS: 8 - DROPOUT: 0.0 - DIM_FEEDFORWARD: 2048 - ENC_LAYERS: 0 - PRE_NORM: False - ENFORCE_INPUT_PROJ: False - SIZE_DIVISIBILITY: 32 - DEC_LAYERS: 10 # 9 decoder layers, add one for the loss on learnable query - TRAIN_NUM_POINTS: 12544 - OVERSAMPLE_RATIO: 3.0 - IMPORTANCE_SAMPLE_RATIO: 0.75 - TEST: - SEMANTIC_ON: False - INSTANCE_ON: True - PANOPTIC_ON: False - OVERLAP_THRESHOLD: 0.8 - OBJECT_MASK_THRESHOLD: 0.8 diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/swin/maskformer2_swin_base_IN21k_384_bs16_90k.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/swin/maskformer2_swin_base_IN21k_384_bs16_90k.yaml deleted file mode 100644 index 2956571..0000000 --- a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/swin/maskformer2_swin_base_IN21k_384_bs16_90k.yaml +++ /dev/null @@ -1,16 +0,0 @@ -_BASE_: ../maskformer2_R50_bs16_90k.yaml -MODEL: - BACKBONE: - NAME: "D2SwinTransformer" - SWIN: - EMBED_DIM: 128 - DEPTHS: [2, 2, 18, 2] - NUM_HEADS: [4, 8, 16, 32] - WINDOW_SIZE: 12 - APE: False - DROP_PATH_RATE: 0.3 - PATCH_NORM: True - PRETRAIN_IMG_SIZE: 384 - WEIGHTS: "swin_base_patch4_window12_384_22k.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_90k.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_90k.yaml deleted file mode 100644 index 72860d9..0000000 --- a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_90k.yaml +++ /dev/null @@ -1,18 +0,0 @@ -_BASE_: ../maskformer2_R50_bs16_90k.yaml -MODEL: - BACKBONE: - NAME: "D2SwinTransformer" - SWIN: - EMBED_DIM: 192 - DEPTHS: [2, 2, 18, 2] - NUM_HEADS: [6, 12, 24, 48] - WINDOW_SIZE: 12 - APE: False - DROP_PATH_RATE: 0.3 - PATCH_NORM: True - PRETRAIN_IMG_SIZE: 384 - WEIGHTS: "swin_large_patch4_window12_384_22k.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] - MASK_FORMER: - NUM_OBJECT_QUERIES: 200 diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/swin/maskformer2_swin_small_bs16_90k.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/swin/maskformer2_swin_small_bs16_90k.yaml deleted file mode 100644 index 156ef9e..0000000 --- a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/swin/maskformer2_swin_small_bs16_90k.yaml +++ /dev/null @@ -1,15 +0,0 @@ -_BASE_: ../maskformer2_R50_bs16_90k.yaml -MODEL: - BACKBONE: - NAME: "D2SwinTransformer" - SWIN: - EMBED_DIM: 96 - DEPTHS: [2, 2, 18, 2] - NUM_HEADS: [3, 6, 12, 24] - WINDOW_SIZE: 7 - APE: False - DROP_PATH_RATE: 0.3 - PATCH_NORM: True - WEIGHTS: "swin_small_patch4_window7_224.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/swin/maskformer2_swin_tiny_bs16_90k.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/swin/maskformer2_swin_tiny_bs16_90k.yaml deleted file mode 100644 index 0c56e2c..0000000 --- a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/swin/maskformer2_swin_tiny_bs16_90k.yaml +++ /dev/null @@ -1,15 +0,0 @@ -_BASE_: ../maskformer2_R50_bs16_90k.yaml -MODEL: - BACKBONE: - NAME: "D2SwinTransformer" - SWIN: - EMBED_DIM: 96 - DEPTHS: [2, 2, 6, 2] - NUM_HEADS: [3, 6, 12, 24] - WINDOW_SIZE: 7 - APE: False - DROP_PATH_RATE: 0.3 - PATCH_NORM: True - WEIGHTS: "swin_tiny_patch4_window7_224.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/Base-Cityscapes-PanopticSegmentation.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/Base-Cityscapes-PanopticSegmentation.yaml deleted file mode 100644 index 022567c..0000000 --- a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/Base-Cityscapes-PanopticSegmentation.yaml +++ /dev/null @@ -1,61 +0,0 @@ -MODEL: - BACKBONE: - FREEZE_AT: 0 - NAME: "build_resnet_backbone" - WEIGHTS: "detectron2://ImageNetPretrained/torchvision/R-50.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] - RESNETS: - DEPTH: 50 - STEM_TYPE: "basic" # not used - STEM_OUT_CHANNELS: 64 - STRIDE_IN_1X1: False - OUT_FEATURES: ["res2", "res3", "res4", "res5"] - NORM: "SyncBN" # use syncbn for cityscapes dataset - RES5_MULTI_GRID: [1, 1, 1] # not used -DATASETS: - TRAIN: ("cityscapes_fine_panoptic_train",) - TEST: ("cityscapes_fine_panoptic_val",) -SOLVER: - IMS_PER_BATCH: 16 - BASE_LR: 0.0001 - MAX_ITER: 90000 - WARMUP_FACTOR: 1.0 - WARMUP_ITERS: 0 - WEIGHT_DECAY: 0.05 - OPTIMIZER: "ADAMW" - LR_SCHEDULER_NAME: "WarmupPolyLR" - BACKBONE_MULTIPLIER: 0.1 - CLIP_GRADIENTS: - ENABLED: True - CLIP_TYPE: "full_model" - CLIP_VALUE: 0.01 - NORM_TYPE: 2.0 - AMP: - ENABLED: True -INPUT: - MIN_SIZE_TRAIN: !!python/object/apply:eval ["[int(x * 0.1 * 1024) for x in range(5, 21)]"] - MIN_SIZE_TRAIN_SAMPLING: "choice" - MIN_SIZE_TEST: 1024 - MAX_SIZE_TRAIN: 4096 - MAX_SIZE_TEST: 2048 - CROP: - ENABLED: True - TYPE: "absolute" - SIZE: (512, 1024) - SINGLE_CATEGORY_MAX_AREA: 1.0 - COLOR_AUG_SSD: True - SIZE_DIVISIBILITY: -1 - FORMAT: "RGB" - DATASET_MAPPER_NAME: "mask_former_panoptic" -TEST: - EVAL_PERIOD: 5000 - AUG: - ENABLED: False - MIN_SIZES: [512, 768, 1024, 1280, 1536, 1792] - MAX_SIZE: 4096 - FLIP: True -DATALOADER: - FILTER_EMPTY_ANNOTATIONS: True - NUM_WORKERS: 4 -VERSION: 2 diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/maskformer2_R101_bs16_90k.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/maskformer2_R101_bs16_90k.yaml deleted file mode 100644 index 1eb38da..0000000 --- a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/maskformer2_R101_bs16_90k.yaml +++ /dev/null @@ -1,11 +0,0 @@ -_BASE_: maskformer2_R50_bs16_90k.yaml -MODEL: - WEIGHTS: "R-101.pkl" - RESNETS: - DEPTH: 101 - STEM_TYPE: "basic" # not used - STEM_OUT_CHANNELS: 64 - STRIDE_IN_1X1: False - OUT_FEATURES: ["res2", "res3", "res4", "res5"] - NORM: "SyncBN" - RES5_MULTI_GRID: [1, 1, 1] # not used diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/maskformer2_R50_bs16_90k.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/maskformer2_R50_bs16_90k.yaml deleted file mode 100644 index 3c2d679..0000000 --- a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/maskformer2_R50_bs16_90k.yaml +++ /dev/null @@ -1,44 +0,0 @@ -_BASE_: Base-Cityscapes-PanopticSegmentation.yaml -MODEL: - META_ARCHITECTURE: "MaskFormer" - SEM_SEG_HEAD: - NAME: "MaskFormerHead" - IGNORE_VALUE: 255 - NUM_CLASSES: 19 - LOSS_WEIGHT: 1.0 - CONVS_DIM: 256 - MASK_DIM: 256 - NORM: "GN" - # pixel decoder - PIXEL_DECODER_NAME: "MSDeformAttnPixelDecoder" - IN_FEATURES: ["res2", "res3", "res4", "res5"] - DEFORMABLE_TRANSFORMER_ENCODER_IN_FEATURES: ["res3", "res4", "res5"] - COMMON_STRIDE: 4 - TRANSFORMER_ENC_LAYERS: 6 - MASK_FORMER: - TRANSFORMER_DECODER_NAME: "MultiScaleMaskedTransformerDecoder" - TRANSFORMER_IN_FEATURE: "multi_scale_pixel_decoder" - DEEP_SUPERVISION: True - NO_OBJECT_WEIGHT: 0.1 - CLASS_WEIGHT: 2.0 - MASK_WEIGHT: 5.0 - DICE_WEIGHT: 5.0 - HIDDEN_DIM: 256 - NUM_OBJECT_QUERIES: 100 - NHEADS: 8 - DROPOUT: 0.0 - DIM_FEEDFORWARD: 2048 - ENC_LAYERS: 0 - PRE_NORM: False - ENFORCE_INPUT_PROJ: False - SIZE_DIVISIBILITY: 32 - DEC_LAYERS: 10 # 9 decoder layers, add one for the loss on learnable query - TRAIN_NUM_POINTS: 12544 - OVERSAMPLE_RATIO: 3.0 - IMPORTANCE_SAMPLE_RATIO: 0.75 - TEST: - SEMANTIC_ON: True - INSTANCE_ON: True - PANOPTIC_ON: True - OVERLAP_THRESHOLD: 0.8 - OBJECT_MASK_THRESHOLD: 0.8 diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/swin/maskformer2_swin_base_IN21k_384_bs16_90k.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/swin/maskformer2_swin_base_IN21k_384_bs16_90k.yaml deleted file mode 100644 index 2956571..0000000 --- a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/swin/maskformer2_swin_base_IN21k_384_bs16_90k.yaml +++ /dev/null @@ -1,16 +0,0 @@ -_BASE_: ../maskformer2_R50_bs16_90k.yaml -MODEL: - BACKBONE: - NAME: "D2SwinTransformer" - SWIN: - EMBED_DIM: 128 - DEPTHS: [2, 2, 18, 2] - NUM_HEADS: [4, 8, 16, 32] - WINDOW_SIZE: 12 - APE: False - DROP_PATH_RATE: 0.3 - PATCH_NORM: True - PRETRAIN_IMG_SIZE: 384 - WEIGHTS: "swin_base_patch4_window12_384_22k.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_90k.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_90k.yaml deleted file mode 100644 index 72860d9..0000000 --- a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_90k.yaml +++ /dev/null @@ -1,18 +0,0 @@ -_BASE_: ../maskformer2_R50_bs16_90k.yaml -MODEL: - BACKBONE: - NAME: "D2SwinTransformer" - SWIN: - EMBED_DIM: 192 - DEPTHS: [2, 2, 18, 2] - NUM_HEADS: [6, 12, 24, 48] - WINDOW_SIZE: 12 - APE: False - DROP_PATH_RATE: 0.3 - PATCH_NORM: True - PRETRAIN_IMG_SIZE: 384 - WEIGHTS: "swin_large_patch4_window12_384_22k.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] - MASK_FORMER: - NUM_OBJECT_QUERIES: 200 diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/swin/maskformer2_swin_small_bs16_90k.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/swin/maskformer2_swin_small_bs16_90k.yaml deleted file mode 100644 index 156ef9e..0000000 --- a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/swin/maskformer2_swin_small_bs16_90k.yaml +++ /dev/null @@ -1,15 +0,0 @@ -_BASE_: ../maskformer2_R50_bs16_90k.yaml -MODEL: - BACKBONE: - NAME: "D2SwinTransformer" - SWIN: - EMBED_DIM: 96 - DEPTHS: [2, 2, 18, 2] - NUM_HEADS: [3, 6, 12, 24] - WINDOW_SIZE: 7 - APE: False - DROP_PATH_RATE: 0.3 - PATCH_NORM: True - WEIGHTS: "swin_small_patch4_window7_224.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/swin/maskformer2_swin_tiny_bs16_90k.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/swin/maskformer2_swin_tiny_bs16_90k.yaml deleted file mode 100644 index 0c56e2c..0000000 --- a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/swin/maskformer2_swin_tiny_bs16_90k.yaml +++ /dev/null @@ -1,15 +0,0 @@ -_BASE_: ../maskformer2_R50_bs16_90k.yaml -MODEL: - BACKBONE: - NAME: "D2SwinTransformer" - SWIN: - EMBED_DIM: 96 - DEPTHS: [2, 2, 6, 2] - NUM_HEADS: [3, 6, 12, 24] - WINDOW_SIZE: 7 - APE: False - DROP_PATH_RATE: 0.3 - PATCH_NORM: True - WEIGHTS: "swin_tiny_patch4_window7_224.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/semantic-segmentation/Base-Cityscapes-SemanticSegmentation.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/semantic-segmentation/Base-Cityscapes-SemanticSegmentation.yaml deleted file mode 100644 index ca42fab..0000000 --- a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/semantic-segmentation/Base-Cityscapes-SemanticSegmentation.yaml +++ /dev/null @@ -1,61 +0,0 @@ -MODEL: - BACKBONE: - FREEZE_AT: 0 - NAME: "build_resnet_backbone" - WEIGHTS: "detectron2://ImageNetPretrained/torchvision/R-50.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] - RESNETS: - DEPTH: 50 - STEM_TYPE: "basic" # not used - STEM_OUT_CHANNELS: 64 - STRIDE_IN_1X1: False - OUT_FEATURES: ["res2", "res3", "res4", "res5"] - NORM: "SyncBN" # use syncbn for cityscapes dataset - RES5_MULTI_GRID: [1, 1, 1] # not used -DATASETS: - TRAIN: ("cityscapes_fine_sem_seg_train",) - TEST: ("cityscapes_fine_sem_seg_val",) -SOLVER: - IMS_PER_BATCH: 16 - BASE_LR: 0.0001 - MAX_ITER: 90000 - WARMUP_FACTOR: 1.0 - WARMUP_ITERS: 0 - WEIGHT_DECAY: 0.05 - OPTIMIZER: "ADAMW" - LR_SCHEDULER_NAME: "WarmupPolyLR" - BACKBONE_MULTIPLIER: 0.1 - CLIP_GRADIENTS: - ENABLED: True - CLIP_TYPE: "full_model" - CLIP_VALUE: 0.01 - NORM_TYPE: 2.0 - AMP: - ENABLED: True -INPUT: - MIN_SIZE_TRAIN: !!python/object/apply:eval ["[int(x * 0.1 * 1024) for x in range(5, 21)]"] - MIN_SIZE_TRAIN_SAMPLING: "choice" - MIN_SIZE_TEST: 1024 - MAX_SIZE_TRAIN: 4096 - MAX_SIZE_TEST: 2048 - CROP: - ENABLED: True - TYPE: "absolute" - SIZE: (512, 1024) - SINGLE_CATEGORY_MAX_AREA: 1.0 - COLOR_AUG_SSD: True - SIZE_DIVISIBILITY: -1 - FORMAT: "RGB" - DATASET_MAPPER_NAME: "mask_former_semantic" -TEST: - EVAL_PERIOD: 5000 - AUG: - ENABLED: False - MIN_SIZES: [512, 768, 1024, 1280, 1536, 1792] - MAX_SIZE: 4096 - FLIP: True -DATALOADER: - FILTER_EMPTY_ANNOTATIONS: True - NUM_WORKERS: 4 -VERSION: 2 diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/semantic-segmentation/maskformer2_R101_bs16_90k.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/semantic-segmentation/maskformer2_R101_bs16_90k.yaml deleted file mode 100644 index 1eb38da..0000000 --- a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/semantic-segmentation/maskformer2_R101_bs16_90k.yaml +++ /dev/null @@ -1,11 +0,0 @@ -_BASE_: maskformer2_R50_bs16_90k.yaml -MODEL: - WEIGHTS: "R-101.pkl" - RESNETS: - DEPTH: 101 - STEM_TYPE: "basic" # not used - STEM_OUT_CHANNELS: 64 - STRIDE_IN_1X1: False - OUT_FEATURES: ["res2", "res3", "res4", "res5"] - NORM: "SyncBN" - RES5_MULTI_GRID: [1, 1, 1] # not used diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/semantic-segmentation/maskformer2_R50_bs16_90k.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/semantic-segmentation/maskformer2_R50_bs16_90k.yaml deleted file mode 100644 index d872fcd..0000000 --- a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/semantic-segmentation/maskformer2_R50_bs16_90k.yaml +++ /dev/null @@ -1,44 +0,0 @@ -_BASE_: Base-Cityscapes-SemanticSegmentation.yaml -MODEL: - META_ARCHITECTURE: "MaskFormer" - SEM_SEG_HEAD: - NAME: "MaskFormerHead" - IGNORE_VALUE: 255 - NUM_CLASSES: 19 - LOSS_WEIGHT: 1.0 - CONVS_DIM: 256 - MASK_DIM: 256 - NORM: "GN" - # pixel decoder - PIXEL_DECODER_NAME: "MSDeformAttnPixelDecoder" - IN_FEATURES: ["res2", "res3", "res4", "res5"] - DEFORMABLE_TRANSFORMER_ENCODER_IN_FEATURES: ["res3", "res4", "res5"] - COMMON_STRIDE: 4 - TRANSFORMER_ENC_LAYERS: 6 - MASK_FORMER: - TRANSFORMER_DECODER_NAME: "MultiScaleMaskedTransformerDecoder" - TRANSFORMER_IN_FEATURE: "multi_scale_pixel_decoder" - DEEP_SUPERVISION: True - NO_OBJECT_WEIGHT: 0.1 - CLASS_WEIGHT: 2.0 - MASK_WEIGHT: 5.0 - DICE_WEIGHT: 5.0 - HIDDEN_DIM: 256 - NUM_OBJECT_QUERIES: 100 - NHEADS: 8 - DROPOUT: 0.0 - DIM_FEEDFORWARD: 2048 - ENC_LAYERS: 0 - PRE_NORM: False - ENFORCE_INPUT_PROJ: False - SIZE_DIVISIBILITY: 32 - DEC_LAYERS: 10 # 9 decoder layers, add one for the loss on learnable query - TRAIN_NUM_POINTS: 12544 - OVERSAMPLE_RATIO: 3.0 - IMPORTANCE_SAMPLE_RATIO: 0.75 - TEST: - SEMANTIC_ON: True - INSTANCE_ON: False - PANOPTIC_ON: False - OVERLAP_THRESHOLD: 0.8 - OBJECT_MASK_THRESHOLD: 0.8 diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/semantic-segmentation/semask_swin/maskformer2_semask_swin_large_IN21k_384_bs16_90k.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/semantic-segmentation/semask_swin/maskformer2_semask_swin_large_IN21k_384_bs16_90k.yaml deleted file mode 100644 index 9dca3c8..0000000 --- a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/semantic-segmentation/semask_swin/maskformer2_semask_swin_large_IN21k_384_bs16_90k.yaml +++ /dev/null @@ -1,18 +0,0 @@ -_BASE_: semask_maskformer2_R50_bs16_90k.yaml -MODEL: - BACKBONE: - NAME: "D2SeMaskSwinTransformer" - SWIN: - EMBED_DIM: 192 - DEPTHS: [2, 2, 18, 2] - NUM_HEADS: [6, 12, 24, 48] - WINDOW_SIZE: 12 - APE: False - DROP_PATH_RATE: 0.3 - PATCH_NORM: True - PRETRAIN_IMG_SIZE: 384 - WEIGHTS: "swin_large_patch4_window12_384_22kto1k.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] - MASK_FORMER: - NUM_OBJECT_QUERIES: 100 diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/semantic-segmentation/semask_swin/semask_maskformer2_R50_bs16_90k.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/semantic-segmentation/semask_swin/semask_maskformer2_R50_bs16_90k.yaml deleted file mode 100644 index 88a53c6..0000000 --- a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/semantic-segmentation/semask_swin/semask_maskformer2_R50_bs16_90k.yaml +++ /dev/null @@ -1,44 +0,0 @@ -_BASE_: ../Base-Cityscapes-SemanticSegmentation.yaml -MODEL: - META_ARCHITECTURE: "SeMaskMaskFormer" - SEM_SEG_HEAD: - NAME: "BranchMaskFormerHead" - IGNORE_VALUE: 255 - NUM_CLASSES: 19 - LOSS_WEIGHT: 1.0 - CONVS_DIM: 256 - MASK_DIM: 256 - NORM: "GN" - # pixel decoder - PIXEL_DECODER_NAME: "MSDeformAttnPixelDecoder" - IN_FEATURES: ["res2", "res3", "res4", "res5"] - DEFORMABLE_TRANSFORMER_ENCODER_IN_FEATURES: ["res3", "res4", "res5"] - COMMON_STRIDE: 4 - TRANSFORMER_ENC_LAYERS: 6 - MASK_FORMER: - TRANSFORMER_DECODER_NAME: "MultiScaleMaskedTransformerDecoder" - TRANSFORMER_IN_FEATURE: "multi_scale_pixel_decoder" - DEEP_SUPERVISION: True - NO_OBJECT_WEIGHT: 0.1 - CLASS_WEIGHT: 2.0 - MASK_WEIGHT: 5.0 - DICE_WEIGHT: 5.0 - HIDDEN_DIM: 256 - NUM_OBJECT_QUERIES: 100 - NHEADS: 8 - DROPOUT: 0.0 - DIM_FEEDFORWARD: 2048 - ENC_LAYERS: 0 - PRE_NORM: False - ENFORCE_INPUT_PROJ: False - SIZE_DIVISIBILITY: 32 - DEC_LAYERS: 10 # 9 decoder layers, add one for the loss on learnable query - TRAIN_NUM_POINTS: 12544 - OVERSAMPLE_RATIO: 3.0 - IMPORTANCE_SAMPLE_RATIO: 0.75 - TEST: - SEMANTIC_ON: True - INSTANCE_ON: False - PANOPTIC_ON: False - OVERLAP_THRESHOLD: 0.8 - OBJECT_MASK_THRESHOLD: 0.8 diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/semantic-segmentation/swin/maskformer2_swin_base_IN21k_384_bs16_90k.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/semantic-segmentation/swin/maskformer2_swin_base_IN21k_384_bs16_90k.yaml deleted file mode 100644 index 2956571..0000000 --- a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/semantic-segmentation/swin/maskformer2_swin_base_IN21k_384_bs16_90k.yaml +++ /dev/null @@ -1,16 +0,0 @@ -_BASE_: ../maskformer2_R50_bs16_90k.yaml -MODEL: - BACKBONE: - NAME: "D2SwinTransformer" - SWIN: - EMBED_DIM: 128 - DEPTHS: [2, 2, 18, 2] - NUM_HEADS: [4, 8, 16, 32] - WINDOW_SIZE: 12 - APE: False - DROP_PATH_RATE: 0.3 - PATCH_NORM: True - PRETRAIN_IMG_SIZE: 384 - WEIGHTS: "swin_base_patch4_window12_384_22k.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/semantic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_90k.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/semantic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_90k.yaml deleted file mode 100644 index 2509717..0000000 --- a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/semantic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_90k.yaml +++ /dev/null @@ -1,18 +0,0 @@ -_BASE_: ../maskformer2_R50_bs16_90k.yaml -MODEL: - BACKBONE: - NAME: "D2SwinTransformer" - SWIN: - EMBED_DIM: 192 - DEPTHS: [2, 2, 18, 2] - NUM_HEADS: [6, 12, 24, 48] - WINDOW_SIZE: 12 - APE: False - DROP_PATH_RATE: 0.3 - PATCH_NORM: True - PRETRAIN_IMG_SIZE: 384 - WEIGHTS: "swin_large_patch4_window12_384_22k.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] - MASK_FORMER: - NUM_OBJECT_QUERIES: 100 diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/semantic-segmentation/swin/maskformer2_swin_small_bs16_90k.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/semantic-segmentation/swin/maskformer2_swin_small_bs16_90k.yaml deleted file mode 100644 index 156ef9e..0000000 --- a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/semantic-segmentation/swin/maskformer2_swin_small_bs16_90k.yaml +++ /dev/null @@ -1,15 +0,0 @@ -_BASE_: ../maskformer2_R50_bs16_90k.yaml -MODEL: - BACKBONE: - NAME: "D2SwinTransformer" - SWIN: - EMBED_DIM: 96 - DEPTHS: [2, 2, 18, 2] - NUM_HEADS: [3, 6, 12, 24] - WINDOW_SIZE: 7 - APE: False - DROP_PATH_RATE: 0.3 - PATCH_NORM: True - WEIGHTS: "swin_small_patch4_window7_224.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/semantic-segmentation/swin/maskformer2_swin_tiny_bs16_90k.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/semantic-segmentation/swin/maskformer2_swin_tiny_bs16_90k.yaml deleted file mode 100644 index 0c56e2c..0000000 --- a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/semantic-segmentation/swin/maskformer2_swin_tiny_bs16_90k.yaml +++ /dev/null @@ -1,15 +0,0 @@ -_BASE_: ../maskformer2_R50_bs16_90k.yaml -MODEL: - BACKBONE: - NAME: "D2SwinTransformer" - SWIN: - EMBED_DIM: 96 - DEPTHS: [2, 2, 6, 2] - NUM_HEADS: [3, 6, 12, 24] - WINDOW_SIZE: 7 - APE: False - DROP_PATH_RATE: 0.3 - PATCH_NORM: True - WEIGHTS: "swin_tiny_patch4_window7_224.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/instance-segmentation/Base-COCO-InstanceSegmentation.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/coco/instance-segmentation/Base-COCO-InstanceSegmentation.yaml deleted file mode 100644 index 98943d9..0000000 --- a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/instance-segmentation/Base-COCO-InstanceSegmentation.yaml +++ /dev/null @@ -1,47 +0,0 @@ -MODEL: - BACKBONE: - FREEZE_AT: 0 - NAME: "build_resnet_backbone" - WEIGHTS: "detectron2://ImageNetPretrained/torchvision/R-50.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] - RESNETS: - DEPTH: 50 - STEM_TYPE: "basic" # not used - STEM_OUT_CHANNELS: 64 - STRIDE_IN_1X1: False - OUT_FEATURES: ["res2", "res3", "res4", "res5"] - # NORM: "SyncBN" - RES5_MULTI_GRID: [1, 1, 1] # not used -DATASETS: - TRAIN: ("coco_2017_train",) - TEST: ("coco_2017_val",) -SOLVER: - IMS_PER_BATCH: 16 - BASE_LR: 0.0001 - STEPS: (327778, 355092) - MAX_ITER: 368750 - WARMUP_FACTOR: 1.0 - WARMUP_ITERS: 10 - WEIGHT_DECAY: 0.05 - OPTIMIZER: "ADAMW" - BACKBONE_MULTIPLIER: 0.1 - CLIP_GRADIENTS: - ENABLED: True - CLIP_TYPE: "full_model" - CLIP_VALUE: 0.01 - NORM_TYPE: 2.0 - AMP: - ENABLED: True -INPUT: - IMAGE_SIZE: 1024 - MIN_SCALE: 0.1 - MAX_SCALE: 2.0 - FORMAT: "RGB" - DATASET_MAPPER_NAME: "coco_instance_lsj" -TEST: - EVAL_PERIOD: 5000 -DATALOADER: - FILTER_EMPTY_ANNOTATIONS: True - NUM_WORKERS: 4 -VERSION: 2 diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/instance-segmentation/maskformer2_R101_bs16_50ep.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/coco/instance-segmentation/maskformer2_R101_bs16_50ep.yaml deleted file mode 100644 index 77defd0..0000000 --- a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/instance-segmentation/maskformer2_R101_bs16_50ep.yaml +++ /dev/null @@ -1,11 +0,0 @@ -_BASE_: maskformer2_R50_bs16_50ep.yaml -MODEL: - WEIGHTS: "R-101.pkl" - RESNETS: - DEPTH: 101 - STEM_TYPE: "basic" # not used - STEM_OUT_CHANNELS: 64 - STRIDE_IN_1X1: False - OUT_FEATURES: ["res2", "res3", "res4", "res5"] - # NORM: "SyncBN" - RES5_MULTI_GRID: [1, 1, 1] # not used diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/instance-segmentation/maskformer2_R50_bs16_50ep.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/coco/instance-segmentation/maskformer2_R50_bs16_50ep.yaml deleted file mode 100644 index 4b9e76e..0000000 --- a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/instance-segmentation/maskformer2_R50_bs16_50ep.yaml +++ /dev/null @@ -1,44 +0,0 @@ -_BASE_: Base-COCO-InstanceSegmentation.yaml -MODEL: - META_ARCHITECTURE: "MaskFormer" - SEM_SEG_HEAD: - NAME: "MaskFormerHead" - IGNORE_VALUE: 255 - NUM_CLASSES: 80 - LOSS_WEIGHT: 1.0 - CONVS_DIM: 256 - MASK_DIM: 256 - NORM: "GN" - # pixel decoder - PIXEL_DECODER_NAME: "MSDeformAttnPixelDecoder" - IN_FEATURES: ["res2", "res3", "res4", "res5"] - DEFORMABLE_TRANSFORMER_ENCODER_IN_FEATURES: ["res3", "res4", "res5"] - COMMON_STRIDE: 4 - TRANSFORMER_ENC_LAYERS: 6 - MASK_FORMER: - TRANSFORMER_DECODER_NAME: "MultiScaleMaskedTransformerDecoder" - TRANSFORMER_IN_FEATURE: "multi_scale_pixel_decoder" - DEEP_SUPERVISION: True - NO_OBJECT_WEIGHT: 0.1 - CLASS_WEIGHT: 2.0 - MASK_WEIGHT: 5.0 - DICE_WEIGHT: 5.0 - HIDDEN_DIM: 256 - NUM_OBJECT_QUERIES: 100 - NHEADS: 8 - DROPOUT: 0.0 - DIM_FEEDFORWARD: 2048 - ENC_LAYERS: 0 - PRE_NORM: False - ENFORCE_INPUT_PROJ: False - SIZE_DIVISIBILITY: 32 - DEC_LAYERS: 10 # 9 decoder layers, add one for the loss on learnable query - TRAIN_NUM_POINTS: 12544 - OVERSAMPLE_RATIO: 3.0 - IMPORTANCE_SAMPLE_RATIO: 0.75 - TEST: - SEMANTIC_ON: False - INSTANCE_ON: True - PANOPTIC_ON: False - OVERLAP_THRESHOLD: 0.8 - OBJECT_MASK_THRESHOLD: 0.8 diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_base_384_bs16_50ep.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_base_384_bs16_50ep.yaml deleted file mode 100644 index 4732999..0000000 --- a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_base_384_bs16_50ep.yaml +++ /dev/null @@ -1,16 +0,0 @@ -_BASE_: ../maskformer2_R50_bs16_50ep.yaml -MODEL: - BACKBONE: - NAME: "D2SwinTransformer" - SWIN: - EMBED_DIM: 128 - DEPTHS: [2, 2, 18, 2] - NUM_HEADS: [4, 8, 16, 32] - WINDOW_SIZE: 12 - APE: False - DROP_PATH_RATE: 0.3 - PATCH_NORM: True - PRETRAIN_IMG_SIZE: 384 - WEIGHTS: "swin_base_patch4_window12_384.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_base_IN21k_384_bs16_50ep.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_base_IN21k_384_bs16_50ep.yaml deleted file mode 100644 index 5dde960..0000000 --- a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_base_IN21k_384_bs16_50ep.yaml +++ /dev/null @@ -1,16 +0,0 @@ -_BASE_: ../maskformer2_R50_bs16_50ep.yaml -MODEL: - BACKBONE: - NAME: "D2SwinTransformer" - SWIN: - EMBED_DIM: 128 - DEPTHS: [2, 2, 18, 2] - NUM_HEADS: [4, 8, 16, 32] - WINDOW_SIZE: 12 - APE: False - DROP_PATH_RATE: 0.3 - PATCH_NORM: True - PRETRAIN_IMG_SIZE: 384 - WEIGHTS: "swin_base_patch4_window12_384_22k.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_100ep.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_100ep.yaml deleted file mode 100644 index b685cdb..0000000 --- a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_100ep.yaml +++ /dev/null @@ -1,21 +0,0 @@ -_BASE_: ../maskformer2_R50_bs16_50ep.yaml -MODEL: - BACKBONE: - NAME: "D2SwinTransformer" - SWIN: - EMBED_DIM: 192 - DEPTHS: [2, 2, 18, 2] - NUM_HEADS: [6, 12, 24, 48] - WINDOW_SIZE: 12 - APE: False - DROP_PATH_RATE: 0.3 - PATCH_NORM: True - PRETRAIN_IMG_SIZE: 384 - WEIGHTS: "swin_large_patch4_window12_384_22k.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] - MASK_FORMER: - NUM_OBJECT_QUERIES: 200 -SOLVER: - STEPS: (655556, 710184) - MAX_ITER: 737500 diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_small_bs16_50ep.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_small_bs16_50ep.yaml deleted file mode 100644 index f9b1c56..0000000 --- a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_small_bs16_50ep.yaml +++ /dev/null @@ -1,15 +0,0 @@ -_BASE_: ../maskformer2_R50_bs16_50ep.yaml -MODEL: - BACKBONE: - NAME: "D2SwinTransformer" - SWIN: - EMBED_DIM: 96 - DEPTHS: [2, 2, 18, 2] - NUM_HEADS: [3, 6, 12, 24] - WINDOW_SIZE: 7 - APE: False - DROP_PATH_RATE: 0.3 - PATCH_NORM: True - WEIGHTS: "swin_small_patch4_window7_224.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_tiny_bs16_50ep.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_tiny_bs16_50ep.yaml deleted file mode 100644 index 7f27bc5..0000000 --- a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_tiny_bs16_50ep.yaml +++ /dev/null @@ -1,15 +0,0 @@ -_BASE_: ../maskformer2_R50_bs16_50ep.yaml -MODEL: - BACKBONE: - NAME: "D2SwinTransformer" - SWIN: - EMBED_DIM: 96 - DEPTHS: [2, 2, 6, 2] - NUM_HEADS: [3, 6, 12, 24] - WINDOW_SIZE: 7 - APE: False - DROP_PATH_RATE: 0.3 - PATCH_NORM: True - WEIGHTS: "swin_tiny_patch4_window7_224.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/panoptic-segmentation/Base-COCO-PanopticSegmentation.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/coco/panoptic-segmentation/Base-COCO-PanopticSegmentation.yaml deleted file mode 100644 index 7560a73..0000000 --- a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/panoptic-segmentation/Base-COCO-PanopticSegmentation.yaml +++ /dev/null @@ -1,47 +0,0 @@ -MODEL: - BACKBONE: - FREEZE_AT: 0 - NAME: "build_resnet_backbone" - WEIGHTS: "detectron2://ImageNetPretrained/torchvision/R-50.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] - RESNETS: - DEPTH: 50 - STEM_TYPE: "basic" # not used - STEM_OUT_CHANNELS: 64 - STRIDE_IN_1X1: False - OUT_FEATURES: ["res2", "res3", "res4", "res5"] - # NORM: "SyncBN" - RES5_MULTI_GRID: [1, 1, 1] # not used -DATASETS: - TRAIN: ("coco_2017_train_panoptic",) - TEST: ("coco_2017_val_panoptic_with_sem_seg",) # to evaluate instance and semantic performance as well -SOLVER: - IMS_PER_BATCH: 16 - BASE_LR: 0.0001 - STEPS: (327778, 355092) - MAX_ITER: 368750 - WARMUP_FACTOR: 1.0 - WARMUP_ITERS: 10 - WEIGHT_DECAY: 0.05 - OPTIMIZER: "ADAMW" - BACKBONE_MULTIPLIER: 0.1 - CLIP_GRADIENTS: - ENABLED: True - CLIP_TYPE: "full_model" - CLIP_VALUE: 0.01 - NORM_TYPE: 2.0 - AMP: - ENABLED: True -INPUT: - IMAGE_SIZE: 1024 - MIN_SCALE: 0.1 - MAX_SCALE: 2.0 - FORMAT: "RGB" - DATASET_MAPPER_NAME: "coco_panoptic_lsj" -TEST: - EVAL_PERIOD: 5000 -DATALOADER: - FILTER_EMPTY_ANNOTATIONS: True - NUM_WORKERS: 4 -VERSION: 2 diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/panoptic-segmentation/maskformer2_R101_bs16_50ep.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/coco/panoptic-segmentation/maskformer2_R101_bs16_50ep.yaml deleted file mode 100644 index 77defd0..0000000 --- a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/panoptic-segmentation/maskformer2_R101_bs16_50ep.yaml +++ /dev/null @@ -1,11 +0,0 @@ -_BASE_: maskformer2_R50_bs16_50ep.yaml -MODEL: - WEIGHTS: "R-101.pkl" - RESNETS: - DEPTH: 101 - STEM_TYPE: "basic" # not used - STEM_OUT_CHANNELS: 64 - STRIDE_IN_1X1: False - OUT_FEATURES: ["res2", "res3", "res4", "res5"] - # NORM: "SyncBN" - RES5_MULTI_GRID: [1, 1, 1] # not used diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/panoptic-segmentation/maskformer2_R50_bs16_50ep.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/coco/panoptic-segmentation/maskformer2_R50_bs16_50ep.yaml deleted file mode 100644 index 9ebf4f1..0000000 --- a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/panoptic-segmentation/maskformer2_R50_bs16_50ep.yaml +++ /dev/null @@ -1,45 +0,0 @@ -_BASE_: Base-COCO-PanopticSegmentation.yaml -MODEL: - META_ARCHITECTURE: "MaskFormer" - SEM_SEG_HEAD: - NAME: "MaskFormerHead" - IN_FEATURES: ["res2", "res3", "res4", "res5"] - IGNORE_VALUE: 255 - NUM_CLASSES: 133 - LOSS_WEIGHT: 1.0 - CONVS_DIM: 256 - MASK_DIM: 256 - NORM: "GN" - # pixel decoder - PIXEL_DECODER_NAME: "MSDeformAttnPixelDecoder" - IN_FEATURES: ["res2", "res3", "res4", "res5"] - DEFORMABLE_TRANSFORMER_ENCODER_IN_FEATURES: ["res3", "res4", "res5"] - COMMON_STRIDE: 4 - TRANSFORMER_ENC_LAYERS: 6 - MASK_FORMER: - TRANSFORMER_DECODER_NAME: "MultiScaleMaskedTransformerDecoder" - TRANSFORMER_IN_FEATURE: "multi_scale_pixel_decoder" - DEEP_SUPERVISION: True - NO_OBJECT_WEIGHT: 0.1 - CLASS_WEIGHT: 2.0 - MASK_WEIGHT: 5.0 - DICE_WEIGHT: 5.0 - HIDDEN_DIM: 256 - NUM_OBJECT_QUERIES: 100 - NHEADS: 8 - DROPOUT: 0.0 - DIM_FEEDFORWARD: 2048 - ENC_LAYERS: 0 - PRE_NORM: False - ENFORCE_INPUT_PROJ: False - SIZE_DIVISIBILITY: 32 - DEC_LAYERS: 10 # 9 decoder layers, add one for the loss on learnable query - TRAIN_NUM_POINTS: 12544 - OVERSAMPLE_RATIO: 3.0 - IMPORTANCE_SAMPLE_RATIO: 0.75 - TEST: - SEMANTIC_ON: True - INSTANCE_ON: True - PANOPTIC_ON: True - OVERLAP_THRESHOLD: 0.8 - OBJECT_MASK_THRESHOLD: 0.8 diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_base_384_bs16_50ep.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_base_384_bs16_50ep.yaml deleted file mode 100644 index 4732999..0000000 --- a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_base_384_bs16_50ep.yaml +++ /dev/null @@ -1,16 +0,0 @@ -_BASE_: ../maskformer2_R50_bs16_50ep.yaml -MODEL: - BACKBONE: - NAME: "D2SwinTransformer" - SWIN: - EMBED_DIM: 128 - DEPTHS: [2, 2, 18, 2] - NUM_HEADS: [4, 8, 16, 32] - WINDOW_SIZE: 12 - APE: False - DROP_PATH_RATE: 0.3 - PATCH_NORM: True - PRETRAIN_IMG_SIZE: 384 - WEIGHTS: "swin_base_patch4_window12_384.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_base_IN21k_384_bs16_50ep.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_base_IN21k_384_bs16_50ep.yaml deleted file mode 100644 index 5dde960..0000000 --- a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_base_IN21k_384_bs16_50ep.yaml +++ /dev/null @@ -1,16 +0,0 @@ -_BASE_: ../maskformer2_R50_bs16_50ep.yaml -MODEL: - BACKBONE: - NAME: "D2SwinTransformer" - SWIN: - EMBED_DIM: 128 - DEPTHS: [2, 2, 18, 2] - NUM_HEADS: [4, 8, 16, 32] - WINDOW_SIZE: 12 - APE: False - DROP_PATH_RATE: 0.3 - PATCH_NORM: True - PRETRAIN_IMG_SIZE: 384 - WEIGHTS: "swin_base_patch4_window12_384_22k.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_100ep.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_100ep.yaml deleted file mode 100644 index b685cdb..0000000 --- a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_100ep.yaml +++ /dev/null @@ -1,21 +0,0 @@ -_BASE_: ../maskformer2_R50_bs16_50ep.yaml -MODEL: - BACKBONE: - NAME: "D2SwinTransformer" - SWIN: - EMBED_DIM: 192 - DEPTHS: [2, 2, 18, 2] - NUM_HEADS: [6, 12, 24, 48] - WINDOW_SIZE: 12 - APE: False - DROP_PATH_RATE: 0.3 - PATCH_NORM: True - PRETRAIN_IMG_SIZE: 384 - WEIGHTS: "swin_large_patch4_window12_384_22k.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] - MASK_FORMER: - NUM_OBJECT_QUERIES: 200 -SOLVER: - STEPS: (655556, 710184) - MAX_ITER: 737500 diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_small_bs16_50ep.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_small_bs16_50ep.yaml deleted file mode 100644 index f9b1c56..0000000 --- a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_small_bs16_50ep.yaml +++ /dev/null @@ -1,15 +0,0 @@ -_BASE_: ../maskformer2_R50_bs16_50ep.yaml -MODEL: - BACKBONE: - NAME: "D2SwinTransformer" - SWIN: - EMBED_DIM: 96 - DEPTHS: [2, 2, 18, 2] - NUM_HEADS: [3, 6, 12, 24] - WINDOW_SIZE: 7 - APE: False - DROP_PATH_RATE: 0.3 - PATCH_NORM: True - WEIGHTS: "swin_small_patch4_window7_224.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_tiny_bs16_50ep.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_tiny_bs16_50ep.yaml deleted file mode 100644 index 7f27bc5..0000000 --- a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_tiny_bs16_50ep.yaml +++ /dev/null @@ -1,15 +0,0 @@ -_BASE_: ../maskformer2_R50_bs16_50ep.yaml -MODEL: - BACKBONE: - NAME: "D2SwinTransformer" - SWIN: - EMBED_DIM: 96 - DEPTHS: [2, 2, 6, 2] - NUM_HEADS: [3, 6, 12, 24] - WINDOW_SIZE: 7 - APE: False - DROP_PATH_RATE: 0.3 - PATCH_NORM: True - WEIGHTS: "swin_tiny_patch4_window7_224.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/mapillary-vistas/panoptic-segmentation/Base-MapillaryVistas-PanopticSegmentation.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/mapillary-vistas/panoptic-segmentation/Base-MapillaryVistas-PanopticSegmentation.yaml deleted file mode 100644 index 86629a3..0000000 --- a/SeMask-FAPN/SeMask-Mask2Former/configs/mapillary-vistas/panoptic-segmentation/Base-MapillaryVistas-PanopticSegmentation.yaml +++ /dev/null @@ -1,56 +0,0 @@ -MODEL: - BACKBONE: - FREEZE_AT: 0 - NAME: "build_resnet_backbone" - WEIGHTS: "detectron2://ImageNetPretrained/torchvision/R-50.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] - RESNETS: - DEPTH: 50 - STEM_TYPE: "basic" # not used - STEM_OUT_CHANNELS: 64 - STRIDE_IN_1X1: False - OUT_FEATURES: ["res2", "res3", "res4", "res5"] - # NORM: "SyncBN" - RES5_MULTI_GRID: [1, 1, 1] # not used -DATASETS: - TRAIN: ("mapillary_vistas_panoptic_train",) - TEST: ("mapillary_vistas_panoptic_val",) -SOLVER: - IMS_PER_BATCH: 16 - BASE_LR: 0.0001 - MAX_ITER: 300000 - WARMUP_FACTOR: 1.0 - WARMUP_ITERS: 0 - WEIGHT_DECAY: 0.05 - OPTIMIZER: "ADAMW" - LR_SCHEDULER_NAME: "WarmupPolyLR" - BACKBONE_MULTIPLIER: 0.1 - CLIP_GRADIENTS: - ENABLED: True - CLIP_TYPE: "full_model" - CLIP_VALUE: 0.01 - NORM_TYPE: 2.0 - AMP: - ENABLED: True -INPUT: - MIN_SIZE_TRAIN: !!python/object/apply:eval ["[int(x * 0.1 * 2048) for x in range(5, 21)]"] - MIN_SIZE_TRAIN_SAMPLING: "choice" - MIN_SIZE_TEST: 2048 - MAX_SIZE_TRAIN: 8192 - MAX_SIZE_TEST: 2048 - CROP: - ENABLED: True - TYPE: "absolute" - SIZE: (1024, 1024) - SINGLE_CATEGORY_MAX_AREA: 1.0 - COLOR_AUG_SSD: True - SIZE_DIVISIBILITY: 1024 # used in dataset mapper - FORMAT: "RGB" - DATASET_MAPPER_NAME: "mask_former_panoptic" -TEST: - EVAL_PERIOD: 0 -DATALOADER: - FILTER_EMPTY_ANNOTATIONS: True - NUM_WORKERS: 10 -VERSION: 2 diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/mapillary-vistas/panoptic-segmentation/maskformer_R50_bs16_300k.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/mapillary-vistas/panoptic-segmentation/maskformer_R50_bs16_300k.yaml deleted file mode 100644 index d6a0eaa..0000000 --- a/SeMask-FAPN/SeMask-Mask2Former/configs/mapillary-vistas/panoptic-segmentation/maskformer_R50_bs16_300k.yaml +++ /dev/null @@ -1,44 +0,0 @@ -_BASE_: Base-MapillaryVistas-SemanticSegmentation.yaml -MODEL: - META_ARCHITECTURE: "MaskFormer" - SEM_SEG_HEAD: - NAME: "MaskFormerHead" - IGNORE_VALUE: 65 - NUM_CLASSES: 65 - LOSS_WEIGHT: 1.0 - CONVS_DIM: 256 - MASK_DIM: 256 - NORM: "GN" - # pixel decoder - PIXEL_DECODER_NAME: "MSDeformAttnPixelDecoder" - IN_FEATURES: ["res2", "res3", "res4", "res5"] - DEFORMABLE_TRANSFORMER_ENCODER_IN_FEATURES: ["res3", "res4", "res5"] - COMMON_STRIDE: 4 - TRANSFORMER_ENC_LAYERS: 6 - MASK_FORMER: - TRANSFORMER_DECODER_NAME: "MultiScaleMaskedTransformerDecoder" - TRANSFORMER_IN_FEATURE: "multi_scale_pixel_decoder" - DEEP_SUPERVISION: True - NO_OBJECT_WEIGHT: 0.1 - CLASS_WEIGHT: 2.0 - MASK_WEIGHT: 5.0 - DICE_WEIGHT: 5.0 - HIDDEN_DIM: 256 - NUM_OBJECT_QUERIES: 100 - NHEADS: 8 - DROPOUT: 0.0 - DIM_FEEDFORWARD: 2048 - ENC_LAYERS: 0 - PRE_NORM: False - ENFORCE_INPUT_PROJ: False - SIZE_DIVISIBILITY: 32 - DEC_LAYERS: 10 # 9 decoder layers, add one for the loss on learnable query - TRAIN_NUM_POINTS: 12544 - OVERSAMPLE_RATIO: 3.0 - IMPORTANCE_SAMPLE_RATIO: 0.75 - TEST: - SEMANTIC_ON: True - INSTANCE_ON: False - PANOPTIC_ON: True - OVERLAP_THRESHOLD: 0.8 - OBJECT_MASK_THRESHOLD: 0.0 diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/mapillary-vistas/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_300k.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/mapillary-vistas/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_300k.yaml deleted file mode 100644 index e7a8c4c..0000000 --- a/SeMask-FAPN/SeMask-Mask2Former/configs/mapillary-vistas/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_300k.yaml +++ /dev/null @@ -1,18 +0,0 @@ -_BASE_: ../maskformer_R50_bs16_300k.yaml -MODEL: - BACKBONE: - NAME: "D2SwinTransformer" - SWIN: - EMBED_DIM: 192 - DEPTHS: [2, 2, 18, 2] - NUM_HEADS: [6, 12, 24, 48] - WINDOW_SIZE: 12 - APE: False - DROP_PATH_RATE: 0.3 - PATCH_NORM: True - PRETRAIN_IMG_SIZE: 384 - WEIGHTS: "swin_large_patch4_window12_384_22k.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] - MASK_FORMER: - NUM_OBJECT_QUERIES: 200 diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/mapillary-vistas/semantic-segmentation/Base-MapillaryVistas-SemanticSegmentation.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/mapillary-vistas/semantic-segmentation/Base-MapillaryVistas-SemanticSegmentation.yaml deleted file mode 100644 index f05fb28..0000000 --- a/SeMask-FAPN/SeMask-Mask2Former/configs/mapillary-vistas/semantic-segmentation/Base-MapillaryVistas-SemanticSegmentation.yaml +++ /dev/null @@ -1,56 +0,0 @@ -MODEL: - BACKBONE: - FREEZE_AT: 0 - NAME: "build_resnet_backbone" - WEIGHTS: "detectron2://ImageNetPretrained/torchvision/R-50.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] - RESNETS: - DEPTH: 50 - STEM_TYPE: "basic" # not used - STEM_OUT_CHANNELS: 64 - STRIDE_IN_1X1: False - OUT_FEATURES: ["res2", "res3", "res4", "res5"] - # NORM: "SyncBN" - RES5_MULTI_GRID: [1, 1, 1] # not used -DATASETS: - TRAIN: ("mapillary_vistas_sem_seg_train",) - TEST: ("mapillary_vistas_sem_seg_val",) -SOLVER: - IMS_PER_BATCH: 16 - BASE_LR: 0.0001 - MAX_ITER: 300000 - WARMUP_FACTOR: 1.0 - WARMUP_ITERS: 0 - WEIGHT_DECAY: 0.05 - OPTIMIZER: "ADAMW" - LR_SCHEDULER_NAME: "WarmupPolyLR" - BACKBONE_MULTIPLIER: 0.1 - CLIP_GRADIENTS: - ENABLED: True - CLIP_TYPE: "full_model" - CLIP_VALUE: 0.01 - NORM_TYPE: 2.0 - AMP: - ENABLED: True -INPUT: - MIN_SIZE_TRAIN: !!python/object/apply:eval ["[int(x * 0.1 * 2048) for x in range(5, 21)]"] - MIN_SIZE_TRAIN_SAMPLING: "choice" - MIN_SIZE_TEST: 2048 - MAX_SIZE_TRAIN: 8192 - MAX_SIZE_TEST: 2048 - CROP: - ENABLED: True - TYPE: "absolute" - SIZE: (1024, 1024) - SINGLE_CATEGORY_MAX_AREA: 1.0 - COLOR_AUG_SSD: True - SIZE_DIVISIBILITY: 1024 # used in dataset mapper - FORMAT: "RGB" - DATASET_MAPPER_NAME: "mask_former_semantic" -TEST: - EVAL_PERIOD: 0 -DATALOADER: - FILTER_EMPTY_ANNOTATIONS: True - NUM_WORKERS: 10 -VERSION: 2 diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/mapillary-vistas/semantic-segmentation/maskformer2_R50_bs16_300k.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/mapillary-vistas/semantic-segmentation/maskformer2_R50_bs16_300k.yaml deleted file mode 100644 index e9977a1..0000000 --- a/SeMask-FAPN/SeMask-Mask2Former/configs/mapillary-vistas/semantic-segmentation/maskformer2_R50_bs16_300k.yaml +++ /dev/null @@ -1,44 +0,0 @@ -_BASE_: Base-MapillaryVistas-SemanticSegmentation.yaml -MODEL: - META_ARCHITECTURE: "MaskFormer" - SEM_SEG_HEAD: - NAME: "MaskFormerHead" - IGNORE_VALUE: 65 - NUM_CLASSES: 65 - LOSS_WEIGHT: 1.0 - CONVS_DIM: 256 - MASK_DIM: 256 - NORM: "GN" - # pixel decoder - PIXEL_DECODER_NAME: "MSDeformAttnPixelDecoder" - IN_FEATURES: ["res2", "res3", "res4", "res5"] - DEFORMABLE_TRANSFORMER_ENCODER_IN_FEATURES: ["res3", "res4", "res5"] - COMMON_STRIDE: 4 - TRANSFORMER_ENC_LAYERS: 6 - MASK_FORMER: - TRANSFORMER_DECODER_NAME: "MultiScaleMaskedTransformerDecoder" - TRANSFORMER_IN_FEATURE: "multi_scale_pixel_decoder" - DEEP_SUPERVISION: True - NO_OBJECT_WEIGHT: 0.1 - CLASS_WEIGHT: 2.0 - MASK_WEIGHT: 5.0 - DICE_WEIGHT: 5.0 - HIDDEN_DIM: 256 - NUM_OBJECT_QUERIES: 100 - NHEADS: 8 - DROPOUT: 0.0 - DIM_FEEDFORWARD: 2048 - ENC_LAYERS: 0 - PRE_NORM: False - ENFORCE_INPUT_PROJ: False - SIZE_DIVISIBILITY: 32 - DEC_LAYERS: 10 # 9 decoder layers, add one for the loss on learnable query - TRAIN_NUM_POINTS: 12544 - OVERSAMPLE_RATIO: 3.0 - IMPORTANCE_SAMPLE_RATIO: 0.75 - TEST: - SEMANTIC_ON: True - INSTANCE_ON: False - PANOPTIC_ON: False - OVERLAP_THRESHOLD: 0.8 - OBJECT_MASK_THRESHOLD: 0.0 diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/mapillary-vistas/semantic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_300k.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/mapillary-vistas/semantic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_300k.yaml deleted file mode 100644 index e336a1b..0000000 --- a/SeMask-FAPN/SeMask-Mask2Former/configs/mapillary-vistas/semantic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_300k.yaml +++ /dev/null @@ -1,18 +0,0 @@ -_BASE_: ../maskformer2_R50_bs16_300k.yaml -MODEL: - BACKBONE: - NAME: "D2SwinTransformer" - SWIN: - EMBED_DIM: 192 - DEPTHS: [2, 2, 18, 2] - NUM_HEADS: [6, 12, 24, 48] - WINDOW_SIZE: 12 - APE: False - DROP_PATH_RATE: 0.3 - PATCH_NORM: True - PRETRAIN_IMG_SIZE: 384 - WEIGHTS: "swin_large_patch4_window12_384_22k.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] - MASK_FORMER: - NUM_OBJECT_QUERIES: 100 diff --git a/SeMask-FPN/README.md b/SeMask-FPN/README.md index ef590e4..a7ee060 100644 --- a/SeMask-FPN/README.md +++ b/SeMask-FPN/README.md @@ -19,11 +19,9 @@ This repo contains the code for our paper **SeMask: Semantically Masked Transfor ### ADE20K - - | Method | Backbone | Crop Size | mIoU | mIoU (ms+flip) | #params | config | Checkpoint | | :---:| :---: | :---: | :---:| :---: | :---: | :---: | :---: | -| SeMask-T FPN | SeMask Swin-T | 512x512 | 42.11 | 43.16 | 35M | [config](configs/semask_swin/ade20k/semfpn_semask_swin_tiny_patch4_window7_512x512_80k_ade20k.py) | TBD | +| SeMask-T FPN | SeMask Swin-T | 512x512 | 42.06 | 43.36 | 35M | [config](configs/semask_swin/ade20k/semfpn_semask_swin_tiny_patch4_window7_512x512_80k_ade20k.py) | [checkpoint](https://drive.google.com/file/d/1L0daUHWQGNGCXHF-cKWEauPSyBV0GLOR/view?usp=sharing) | | SeMask-S FPN | SeMask Swin-S | 512x512 | 45.92 | 47.63 | 56M | [config](configs/semask_swin/ade20k/semfpn_semask_swin_small_patch4_window7_512x512_80k_ade20k.py) | [checkpoint](https://drive.google.com/file/d/1QhDG4SyGFtWL5kP9BbBoyPqTuFu7fH_y/view?usp=sharing) | | SeMask-B FPN | SeMask Swin-B | 512x512 | 49.35 | 50.98 | 96M | [config](configs/semask_swin/ade20k/semfpn_semask_swin_base_patch4_window12_512x512_80k_ade20k.py) | [checkpoint](https://drive.google.com/file/d/1PXCEhrrUy5TJC4dUp7YDQvaapnMzGT6C/view?usp=sharing) | | SeMask-L FPN | SeMask Swin-L | 640x640 | 51.89 | 53.52 | 211M| [config](configs/semask_swin/ade20k/semfpn_semask_swin_large_patch4_window12_640x640_80k_ade20k.py) | [checkpoint](https://drive.google.com/file/d/1u5flfAQCiQJbMZbZPIlGUGTYBz9Ca7rE/view?usp=sharing) | diff --git a/SeMask-FPN/demo/demo.py b/SeMask-FPN/demo/demo.py index 8414302..e4fb23b 100644 --- a/SeMask-FPN/demo/demo.py +++ b/SeMask-FPN/demo/demo.py @@ -7,9 +7,9 @@ def main(): parser = ArgumentParser() - parser.add_argument('img', help='Image file') - parser.add_argument('config', help='Config file') - parser.add_argument('checkpoint', help='Checkpoint file') + parser.add_argument('--img', help='Image file') + parser.add_argument('--config', help='Config file') + parser.add_argument('--checkpoint', help='Checkpoint file') parser.add_argument( '--device', default='cuda:0', help='Device used for inference') parser.add_argument( diff --git a/SeMask-FPN/mmseg/apis/inference.py b/SeMask-FPN/mmseg/apis/inference.py index 20c20dc..075a919 100644 --- a/SeMask-FPN/mmseg/apis/inference.py +++ b/SeMask-FPN/mmseg/apis/inference.py @@ -10,7 +10,6 @@ def init_segmentor(config, checkpoint=None, device='cuda:0'): """Initialize a segmentor from config file. - Args: config (str or :obj:`mmcv.Config`): Config file path or the config object. @@ -44,11 +43,9 @@ class LoadImage: def __call__(self, results): """Call function to load images into results. - Args: results (dict): A result dict contains the file name of the image to be read. - Returns: dict: ``results`` will be returned containing loaded image. """ @@ -68,12 +65,10 @@ def __call__(self, results): def inference_segmentor(model, img): """Inference image(s) with the segmentor. - Args: model (nn.Module): The loaded segmentor. imgs (str/ndarray or list[str/ndarray]): Either image files or loaded images. - Returns: (list[Tensor]): The segmentation result. """ @@ -98,9 +93,15 @@ def inference_segmentor(model, img): return result -def show_result_pyplot(model, img, result, palette=None, fig_size=(15, 10)): +def show_result_pyplot(model, + img, + result, + palette=None, + fig_size=(15, 10), + opacity=0.5, + title='', + block=True): """Visualize the segmentation results on the image. - Args: model (nn.Module): The loaded segmentor. img (str or np.ndarray): Image filename or loaded image. @@ -109,10 +110,20 @@ def show_result_pyplot(model, img, result, palette=None, fig_size=(15, 10)): map. If None is given, random palette will be generated. Default: None fig_size (tuple): Figure size of the pyplot figure. + opacity(float): Opacity of painted segmentation map. + Default 0.5. + Must be in (0, 1] range. + title (str): The title of pyplot figure. + Default is ''. + block (bool): Whether to block the pyplot figure. + Default is True. """ if hasattr(model, 'module'): model = model.module - img = model.show_result(img, result, palette=palette, show=False) + img = model.show_result( + img, result, palette=palette, show=False, opacity=opacity) plt.figure(figsize=fig_size) plt.imshow(mmcv.bgr2rgb(img)) - plt.show() + plt.title(title) + plt.tight_layout() + plt.show(block=block) diff --git a/SeMask-FPN/mmseg/models/segmentors/base.py b/SeMask-FPN/mmseg/models/segmentors/base.py index 1c69406..b80d956 100644 --- a/SeMask-FPN/mmseg/models/segmentors/base.py +++ b/SeMask-FPN/mmseg/models/segmentors/base.py @@ -226,6 +226,81 @@ def _parse_losses(losses): return loss, log_vars + def show_inference_result(self, + img, + result, + palette=None, + win_name='', + show=False, + wait_time=0, + out_file=None, + opacity=0.5): + """Draw `result` over `img`. + Args: + img (str or Tensor): The image to be displayed. + result (Tensor): The semantic segmentation results to draw over + `img`. + palette (list[list[int]]] | np.ndarray | None): The palette of + segmentation map. If None is given, random palette will be + generated. Default: None + win_name (str): The window name. + wait_time (int): Value of waitKey param. + Default: 0. + show (bool): Whether to show the image. + Default: False. + out_file (str or None): The filename to write the image. + Default: None. + opacity(float): Opacity of painted segmentation map. + Default 0.5. + Must be in (0, 1] range. + Returns: + img (Tensor): Only if not `show` or `out_file` + """ + img = mmcv.imread(img) + img = img.copy() + seg = result[0] + if palette is None: + if self.PALETTE is None: + # Get random state before set seed, + # and restore random state later. + # It will prevent loss of randomness, as the palette + # may be different in each iteration if not specified. + # See: https://github.com/open-mmlab/mmdetection/issues/5844 + state = np.random.get_state() + np.random.seed(42) + # random palette + palette = np.random.randint( + 0, 255, size=(len(self.CLASSES), 3)) + np.random.set_state(state) + else: + palette = self.PALETTE + palette = np.array(palette) + assert palette.shape[0] == len(self.CLASSES) + assert palette.shape[1] == 3 + assert len(palette.shape) == 2 + assert 0 < opacity <= 1.0 + color_seg = np.zeros((seg.shape[0], seg.shape[1], 3), dtype=np.uint8) + for label, color in enumerate(palette): + color_seg[seg == label, :] = color + # convert to BGR + color_seg = color_seg[..., ::-1] + + img = img * (1 - opacity) + color_seg * opacity + img = img.astype(np.uint8) + # if out_file specified, do not show image in window + if out_file is not None: + show = False + + if show: + mmcv.imshow(img, win_name, wait_time) + if out_file is not None: + mmcv.imwrite(img, out_file) + + if not (show or out_file): + warnings.warn('show==False and out_file is not specified, only ' + 'result image will be returned') + return img + def show_result(self, i, img, diff --git a/SeMask-Mask2Former/GETTING_STARTED.md b/SeMask-Mask2Former/GETTING_STARTED.md index 6bb096c..c2eb6f7 100644 --- a/SeMask-Mask2Former/GETTING_STARTED.md +++ b/SeMask-Mask2Former/GETTING_STARTED.md @@ -9,11 +9,11 @@ Please see [Getting Started with Detectron2](https://github.com/facebookresearch 1. Pick a model and its config file from [model zoo](MODEL_ZOO.md), - for example, `configs/coco/panoptic-segmentation/maskformer2_R50_bs16_50ep.yaml`. + for example, `configs/ade20k/semantic-segmentation/semask_swin/maskformer2_semask_swin_large_IN21k_384_bs16_160k_res640.yaml`. 2. We provide `demo.py` that is able to demo builtin configs. Run it with: ``` cd demo/ -python demo.py --config-file ../configs/coco/panoptic-segmentation/maskformer2_R50_bs16_50ep.yaml \ +python demo.py --config-file ../configs/ade20k/semantic-segmentation/semask_swin/maskformer2_semask_swin_large_IN21k_384_bs16_160k_res640.yaml \ --input input1.jpg input2.jpg \ [--other-options] --opts MODEL.WEIGHTS /path/to/checkpoint_file @@ -39,7 +39,7 @@ setup the corresponding datasets following then run: ``` python train_net.py --num-gpus 8 \ - --config-file configs/coco/panoptic-segmentation/maskformer2_R50_bs16_50ep.yaml + --config-file configs/ade20k/semantic-segmentation/semask_swin/maskformer2_semask_swin_large_IN21k_384_bs16_160k_res640.yaml ``` The configs are made for 8-GPU training. @@ -47,14 +47,14 @@ Since we use ADAMW optimizer, it is not clear how to scale learning rate with ba To train on 1 GPU, you need to figure out learning rate and batch size by yourself: ``` python train_net.py \ - --config-file configs/coco/panoptic-segmentation/maskformer2_R50_bs16_50ep.yaml \ + --config-file configs/ade20k/semantic-segmentation/semask_swin/maskformer2_semask_swin_large_IN21k_384_bs16_160k_res640.yaml \ --num-gpus 1 SOLVER.IMS_PER_BATCH SET_TO_SOME_REASONABLE_VALUE SOLVER.BASE_LR SET_TO_SOME_REASONABLE_VALUE ``` To evaluate a model's performance, use ``` python train_net.py \ - --config-file configs/coco/panoptic-segmentation/maskformer2_R50_bs16_50ep.yaml \ + --config-file configs/ade20k/semantic-segmentation/semask_swin/maskformer2_semask_swin_large_IN21k_384_bs16_160k_res640.yaml \ --eval-only MODEL.WEIGHTS /path/to/checkpoint_file ``` For more options, see `python train_net.py -h`. diff --git a/SeMask-Mask2Former/configs/ade20k/instance-segmentation/Base-ADE20K-InstanceSegmentation.yaml b/SeMask-Mask2Former/configs/ade20k/instance-segmentation/Base-ADE20K-InstanceSegmentation.yaml deleted file mode 100644 index 50a1c13..0000000 --- a/SeMask-Mask2Former/configs/ade20k/instance-segmentation/Base-ADE20K-InstanceSegmentation.yaml +++ /dev/null @@ -1,61 +0,0 @@ -MODEL: - BACKBONE: - FREEZE_AT: 0 - NAME: "build_resnet_backbone" - WEIGHTS: "detectron2://ImageNetPretrained/torchvision/R-50.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] - RESNETS: - DEPTH: 50 - STEM_TYPE: "basic" # not used - STEM_OUT_CHANNELS: 64 - STRIDE_IN_1X1: False - OUT_FEATURES: ["res2", "res3", "res4", "res5"] - # NORM: "SyncBN" - RES5_MULTI_GRID: [1, 1, 1] # not used -DATASETS: - TRAIN: ("ade20k_instance_train",) - TEST: ("ade20k_instance_val",) -SOLVER: - IMS_PER_BATCH: 16 - BASE_LR: 0.0001 - MAX_ITER: 160000 - WARMUP_FACTOR: 1.0 - WARMUP_ITERS: 0 - WEIGHT_DECAY: 0.05 - OPTIMIZER: "ADAMW" - LR_SCHEDULER_NAME: "WarmupPolyLR" - BACKBONE_MULTIPLIER: 0.1 - CLIP_GRADIENTS: - ENABLED: True - CLIP_TYPE: "full_model" - CLIP_VALUE: 0.01 - NORM_TYPE: 2.0 - AMP: - ENABLED: True -INPUT: - MIN_SIZE_TRAIN: !!python/object/apply:eval ["[int(x * 0.1 * 640) for x in range(5, 21)]"] - MIN_SIZE_TRAIN_SAMPLING: "choice" - MIN_SIZE_TEST: 640 - MAX_SIZE_TRAIN: 2560 - MAX_SIZE_TEST: 2560 - CROP: - ENABLED: True - TYPE: "absolute" - SIZE: (640, 640) - SINGLE_CATEGORY_MAX_AREA: 1.0 - COLOR_AUG_SSD: True - SIZE_DIVISIBILITY: 640 # used in dataset mapper - FORMAT: "RGB" - DATASET_MAPPER_NAME: "mask_former_instance" -TEST: - EVAL_PERIOD: 5000 - AUG: - ENABLED: False - MIN_SIZES: [320, 480, 640, 800, 960, 1120] - MAX_SIZE: 4480 - FLIP: True -DATALOADER: - FILTER_EMPTY_ANNOTATIONS: True - NUM_WORKERS: 4 -VERSION: 2 diff --git a/SeMask-Mask2Former/configs/ade20k/instance-segmentation/maskformer2_R50_bs16_160k.yaml b/SeMask-Mask2Former/configs/ade20k/instance-segmentation/maskformer2_R50_bs16_160k.yaml deleted file mode 100644 index e37bcfb..0000000 --- a/SeMask-Mask2Former/configs/ade20k/instance-segmentation/maskformer2_R50_bs16_160k.yaml +++ /dev/null @@ -1,44 +0,0 @@ -_BASE_: Base-ADE20K-InstanceSegmentation.yaml -MODEL: - META_ARCHITECTURE: "MaskFormer" - SEM_SEG_HEAD: - NAME: "MaskFormerHead" - IGNORE_VALUE: 255 - NUM_CLASSES: 100 - LOSS_WEIGHT: 1.0 - CONVS_DIM: 256 - MASK_DIM: 256 - NORM: "GN" - # pixel decoder - PIXEL_DECODER_NAME: "MSDeformAttnPixelDecoder" - IN_FEATURES: ["res2", "res3", "res4", "res5"] - DEFORMABLE_TRANSFORMER_ENCODER_IN_FEATURES: ["res3", "res4", "res5"] - COMMON_STRIDE: 4 - TRANSFORMER_ENC_LAYERS: 6 - MASK_FORMER: - TRANSFORMER_DECODER_NAME: "MultiScaleMaskedTransformerDecoder" - TRANSFORMER_IN_FEATURE: "multi_scale_pixel_decoder" - DEEP_SUPERVISION: True - NO_OBJECT_WEIGHT: 0.1 - CLASS_WEIGHT: 2.0 - MASK_WEIGHT: 5.0 - DICE_WEIGHT: 5.0 - HIDDEN_DIM: 256 - NUM_OBJECT_QUERIES: 100 - NHEADS: 8 - DROPOUT: 0.0 - DIM_FEEDFORWARD: 2048 - ENC_LAYERS: 0 - PRE_NORM: False - ENFORCE_INPUT_PROJ: False - SIZE_DIVISIBILITY: 32 - DEC_LAYERS: 10 # 9 decoder layers, add one for the loss on learnable query - TRAIN_NUM_POINTS: 12544 - OVERSAMPLE_RATIO: 3.0 - IMPORTANCE_SAMPLE_RATIO: 0.75 - TEST: - SEMANTIC_ON: True - INSTANCE_ON: True - PANOPTIC_ON: True - OVERLAP_THRESHOLD: 0.8 - OBJECT_MASK_THRESHOLD: 0.8 diff --git a/SeMask-Mask2Former/configs/ade20k/instance-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_160k.yaml b/SeMask-Mask2Former/configs/ade20k/instance-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_160k.yaml deleted file mode 100644 index af03d4d..0000000 --- a/SeMask-Mask2Former/configs/ade20k/instance-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_160k.yaml +++ /dev/null @@ -1,18 +0,0 @@ -_BASE_: ../maskformer2_R50_bs16_160k.yaml -MODEL: - BACKBONE: - NAME: "D2SwinTransformer" - SWIN: - EMBED_DIM: 192 - DEPTHS: [2, 2, 18, 2] - NUM_HEADS: [6, 12, 24, 48] - WINDOW_SIZE: 12 - APE: False - DROP_PATH_RATE: 0.3 - PATCH_NORM: True - PRETRAIN_IMG_SIZE: 384 - WEIGHTS: "swin_large_patch4_window12_384_22k.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] - MASK_FORMER: - NUM_OBJECT_QUERIES: 200 diff --git a/SeMask-Mask2Former/configs/ade20k/panoptic-segmentation/Base-ADE20K-PanopticSegmentation.yaml b/SeMask-Mask2Former/configs/ade20k/panoptic-segmentation/Base-ADE20K-PanopticSegmentation.yaml deleted file mode 100644 index 559be07..0000000 --- a/SeMask-Mask2Former/configs/ade20k/panoptic-segmentation/Base-ADE20K-PanopticSegmentation.yaml +++ /dev/null @@ -1,61 +0,0 @@ -MODEL: - BACKBONE: - FREEZE_AT: 0 - NAME: "build_resnet_backbone" - WEIGHTS: "detectron2://ImageNetPretrained/torchvision/R-50.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] - RESNETS: - DEPTH: 50 - STEM_TYPE: "basic" # not used - STEM_OUT_CHANNELS: 64 - STRIDE_IN_1X1: False - OUT_FEATURES: ["res2", "res3", "res4", "res5"] - # NORM: "SyncBN" - RES5_MULTI_GRID: [1, 1, 1] # not used -DATASETS: - TRAIN: ("ade20k_panoptic_train",) - TEST: ("ade20k_panoptic_val",) -SOLVER: - IMS_PER_BATCH: 16 - BASE_LR: 0.0001 - MAX_ITER: 160000 - WARMUP_FACTOR: 1.0 - WARMUP_ITERS: 0 - WEIGHT_DECAY: 0.05 - OPTIMIZER: "ADAMW" - LR_SCHEDULER_NAME: "WarmupPolyLR" - BACKBONE_MULTIPLIER: 0.1 - CLIP_GRADIENTS: - ENABLED: True - CLIP_TYPE: "full_model" - CLIP_VALUE: 0.01 - NORM_TYPE: 2.0 - AMP: - ENABLED: True -INPUT: - MIN_SIZE_TRAIN: !!python/object/apply:eval ["[int(x * 0.1 * 640) for x in range(5, 21)]"] - MIN_SIZE_TRAIN_SAMPLING: "choice" - MIN_SIZE_TEST: 640 - MAX_SIZE_TRAIN: 2560 - MAX_SIZE_TEST: 2560 - CROP: - ENABLED: True - TYPE: "absolute" - SIZE: (640, 640) - SINGLE_CATEGORY_MAX_AREA: 1.0 - COLOR_AUG_SSD: True - SIZE_DIVISIBILITY: 640 # used in dataset mapper - FORMAT: "RGB" - DATASET_MAPPER_NAME: "mask_former_panoptic" -TEST: - EVAL_PERIOD: 5000 - AUG: - ENABLED: False - MIN_SIZES: [320, 480, 640, 800, 960, 1120] - MAX_SIZE: 4480 - FLIP: True -DATALOADER: - FILTER_EMPTY_ANNOTATIONS: True - NUM_WORKERS: 4 -VERSION: 2 diff --git a/SeMask-Mask2Former/configs/ade20k/panoptic-segmentation/maskformer2_R50_bs16_160k.yaml b/SeMask-Mask2Former/configs/ade20k/panoptic-segmentation/maskformer2_R50_bs16_160k.yaml deleted file mode 100644 index 82c0828..0000000 --- a/SeMask-Mask2Former/configs/ade20k/panoptic-segmentation/maskformer2_R50_bs16_160k.yaml +++ /dev/null @@ -1,44 +0,0 @@ -_BASE_: Base-ADE20K-PanopticSegmentation.yaml -MODEL: - META_ARCHITECTURE: "MaskFormer" - SEM_SEG_HEAD: - NAME: "MaskFormerHead" - IGNORE_VALUE: 255 - NUM_CLASSES: 150 - LOSS_WEIGHT: 1.0 - CONVS_DIM: 256 - MASK_DIM: 256 - NORM: "GN" - # pixel decoder - PIXEL_DECODER_NAME: "MSDeformAttnPixelDecoder" - IN_FEATURES: ["res2", "res3", "res4", "res5"] - DEFORMABLE_TRANSFORMER_ENCODER_IN_FEATURES: ["res3", "res4", "res5"] - COMMON_STRIDE: 4 - TRANSFORMER_ENC_LAYERS: 6 - MASK_FORMER: - TRANSFORMER_DECODER_NAME: "MultiScaleMaskedTransformerDecoder" - TRANSFORMER_IN_FEATURE: "multi_scale_pixel_decoder" - DEEP_SUPERVISION: True - NO_OBJECT_WEIGHT: 0.1 - CLASS_WEIGHT: 2.0 - MASK_WEIGHT: 5.0 - DICE_WEIGHT: 5.0 - HIDDEN_DIM: 256 - NUM_OBJECT_QUERIES: 100 - NHEADS: 8 - DROPOUT: 0.0 - DIM_FEEDFORWARD: 2048 - ENC_LAYERS: 0 - PRE_NORM: False - ENFORCE_INPUT_PROJ: False - SIZE_DIVISIBILITY: 32 - DEC_LAYERS: 10 # 9 decoder layers, add one for the loss on learnable query - TRAIN_NUM_POINTS: 12544 - OVERSAMPLE_RATIO: 3.0 - IMPORTANCE_SAMPLE_RATIO: 0.75 - TEST: - SEMANTIC_ON: True - INSTANCE_ON: True - PANOPTIC_ON: True - OVERLAP_THRESHOLD: 0.8 - OBJECT_MASK_THRESHOLD: 0.8 diff --git a/SeMask-Mask2Former/configs/ade20k/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_160k.yaml b/SeMask-Mask2Former/configs/ade20k/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_160k.yaml deleted file mode 100644 index af03d4d..0000000 --- a/SeMask-Mask2Former/configs/ade20k/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_160k.yaml +++ /dev/null @@ -1,18 +0,0 @@ -_BASE_: ../maskformer2_R50_bs16_160k.yaml -MODEL: - BACKBONE: - NAME: "D2SwinTransformer" - SWIN: - EMBED_DIM: 192 - DEPTHS: [2, 2, 18, 2] - NUM_HEADS: [6, 12, 24, 48] - WINDOW_SIZE: 12 - APE: False - DROP_PATH_RATE: 0.3 - PATCH_NORM: True - PRETRAIN_IMG_SIZE: 384 - WEIGHTS: "swin_large_patch4_window12_384_22k.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] - MASK_FORMER: - NUM_OBJECT_QUERIES: 200 diff --git a/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/Base-Cityscapes-InstanceSegmentation.yaml b/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/Base-Cityscapes-InstanceSegmentation.yaml deleted file mode 100644 index 28833e7..0000000 --- a/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/Base-Cityscapes-InstanceSegmentation.yaml +++ /dev/null @@ -1,61 +0,0 @@ -MODEL: - BACKBONE: - FREEZE_AT: 0 - NAME: "build_resnet_backbone" - WEIGHTS: "detectron2://ImageNetPretrained/torchvision/R-50.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] - RESNETS: - DEPTH: 50 - STEM_TYPE: "basic" # not used - STEM_OUT_CHANNELS: 64 - STRIDE_IN_1X1: False - OUT_FEATURES: ["res2", "res3", "res4", "res5"] - NORM: "SyncBN" # use syncbn for cityscapes dataset - RES5_MULTI_GRID: [1, 1, 1] # not used -DATASETS: - TRAIN: ("cityscapes_fine_instance_seg_train",) - TEST: ("cityscapes_fine_instance_seg_val",) -SOLVER: - IMS_PER_BATCH: 16 - BASE_LR: 0.0001 - MAX_ITER: 90000 - WARMUP_FACTOR: 1.0 - WARMUP_ITERS: 0 - WEIGHT_DECAY: 0.05 - OPTIMIZER: "ADAMW" - LR_SCHEDULER_NAME: "WarmupPolyLR" - BACKBONE_MULTIPLIER: 0.1 - CLIP_GRADIENTS: - ENABLED: True - CLIP_TYPE: "full_model" - CLIP_VALUE: 0.01 - NORM_TYPE: 2.0 - AMP: - ENABLED: True -INPUT: - MIN_SIZE_TRAIN: !!python/object/apply:eval ["[int(x * 0.1 * 1024) for x in range(5, 21)]"] - MIN_SIZE_TRAIN_SAMPLING: "choice" - MIN_SIZE_TEST: 1024 - MAX_SIZE_TRAIN: 4096 - MAX_SIZE_TEST: 2048 - CROP: - ENABLED: True - TYPE: "absolute" - SIZE: (512, 1024) - SINGLE_CATEGORY_MAX_AREA: 1.0 - COLOR_AUG_SSD: True - SIZE_DIVISIBILITY: -1 - FORMAT: "RGB" - DATASET_MAPPER_NAME: "mask_former_instance" -TEST: - EVAL_PERIOD: 5000 - AUG: - ENABLED: False - MIN_SIZES: [512, 768, 1024, 1280, 1536, 1792] - MAX_SIZE: 4096 - FLIP: True -DATALOADER: - FILTER_EMPTY_ANNOTATIONS: True - NUM_WORKERS: 4 -VERSION: 2 diff --git a/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/maskformer2_R101_bs16_90k.yaml b/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/maskformer2_R101_bs16_90k.yaml deleted file mode 100644 index 1eb38da..0000000 --- a/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/maskformer2_R101_bs16_90k.yaml +++ /dev/null @@ -1,11 +0,0 @@ -_BASE_: maskformer2_R50_bs16_90k.yaml -MODEL: - WEIGHTS: "R-101.pkl" - RESNETS: - DEPTH: 101 - STEM_TYPE: "basic" # not used - STEM_OUT_CHANNELS: 64 - STRIDE_IN_1X1: False - OUT_FEATURES: ["res2", "res3", "res4", "res5"] - NORM: "SyncBN" - RES5_MULTI_GRID: [1, 1, 1] # not used diff --git a/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/maskformer2_R50_bs16_90k.yaml b/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/maskformer2_R50_bs16_90k.yaml deleted file mode 100644 index 16b215b..0000000 --- a/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/maskformer2_R50_bs16_90k.yaml +++ /dev/null @@ -1,44 +0,0 @@ -_BASE_: Base-Cityscapes-InstanceSegmentation.yaml -MODEL: - META_ARCHITECTURE: "MaskFormer" - SEM_SEG_HEAD: - NAME: "MaskFormerHead" - IGNORE_VALUE: 255 - NUM_CLASSES: 8 - LOSS_WEIGHT: 1.0 - CONVS_DIM: 256 - MASK_DIM: 256 - NORM: "GN" - # pixel decoder - PIXEL_DECODER_NAME: "MSDeformAttnPixelDecoder" - IN_FEATURES: ["res2", "res3", "res4", "res5"] - DEFORMABLE_TRANSFORMER_ENCODER_IN_FEATURES: ["res3", "res4", "res5"] - COMMON_STRIDE: 4 - TRANSFORMER_ENC_LAYERS: 6 - MASK_FORMER: - TRANSFORMER_DECODER_NAME: "MultiScaleMaskedTransformerDecoder" - TRANSFORMER_IN_FEATURE: "multi_scale_pixel_decoder" - DEEP_SUPERVISION: True - NO_OBJECT_WEIGHT: 0.1 - CLASS_WEIGHT: 2.0 - MASK_WEIGHT: 5.0 - DICE_WEIGHT: 5.0 - HIDDEN_DIM: 256 - NUM_OBJECT_QUERIES: 100 - NHEADS: 8 - DROPOUT: 0.0 - DIM_FEEDFORWARD: 2048 - ENC_LAYERS: 0 - PRE_NORM: False - ENFORCE_INPUT_PROJ: False - SIZE_DIVISIBILITY: 32 - DEC_LAYERS: 10 # 9 decoder layers, add one for the loss on learnable query - TRAIN_NUM_POINTS: 12544 - OVERSAMPLE_RATIO: 3.0 - IMPORTANCE_SAMPLE_RATIO: 0.75 - TEST: - SEMANTIC_ON: False - INSTANCE_ON: True - PANOPTIC_ON: False - OVERLAP_THRESHOLD: 0.8 - OBJECT_MASK_THRESHOLD: 0.8 diff --git a/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/swin/maskformer2_swin_base_IN21k_384_bs16_90k.yaml b/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/swin/maskformer2_swin_base_IN21k_384_bs16_90k.yaml deleted file mode 100644 index 2956571..0000000 --- a/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/swin/maskformer2_swin_base_IN21k_384_bs16_90k.yaml +++ /dev/null @@ -1,16 +0,0 @@ -_BASE_: ../maskformer2_R50_bs16_90k.yaml -MODEL: - BACKBONE: - NAME: "D2SwinTransformer" - SWIN: - EMBED_DIM: 128 - DEPTHS: [2, 2, 18, 2] - NUM_HEADS: [4, 8, 16, 32] - WINDOW_SIZE: 12 - APE: False - DROP_PATH_RATE: 0.3 - PATCH_NORM: True - PRETRAIN_IMG_SIZE: 384 - WEIGHTS: "swin_base_patch4_window12_384_22k.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] diff --git a/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_90k.yaml b/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_90k.yaml deleted file mode 100644 index 72860d9..0000000 --- a/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_90k.yaml +++ /dev/null @@ -1,18 +0,0 @@ -_BASE_: ../maskformer2_R50_bs16_90k.yaml -MODEL: - BACKBONE: - NAME: "D2SwinTransformer" - SWIN: - EMBED_DIM: 192 - DEPTHS: [2, 2, 18, 2] - NUM_HEADS: [6, 12, 24, 48] - WINDOW_SIZE: 12 - APE: False - DROP_PATH_RATE: 0.3 - PATCH_NORM: True - PRETRAIN_IMG_SIZE: 384 - WEIGHTS: "swin_large_patch4_window12_384_22k.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] - MASK_FORMER: - NUM_OBJECT_QUERIES: 200 diff --git a/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/swin/maskformer2_swin_small_bs16_90k.yaml b/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/swin/maskformer2_swin_small_bs16_90k.yaml deleted file mode 100644 index 156ef9e..0000000 --- a/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/swin/maskformer2_swin_small_bs16_90k.yaml +++ /dev/null @@ -1,15 +0,0 @@ -_BASE_: ../maskformer2_R50_bs16_90k.yaml -MODEL: - BACKBONE: - NAME: "D2SwinTransformer" - SWIN: - EMBED_DIM: 96 - DEPTHS: [2, 2, 18, 2] - NUM_HEADS: [3, 6, 12, 24] - WINDOW_SIZE: 7 - APE: False - DROP_PATH_RATE: 0.3 - PATCH_NORM: True - WEIGHTS: "swin_small_patch4_window7_224.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] diff --git a/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/swin/maskformer2_swin_tiny_bs16_90k.yaml b/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/swin/maskformer2_swin_tiny_bs16_90k.yaml deleted file mode 100644 index 0c56e2c..0000000 --- a/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/swin/maskformer2_swin_tiny_bs16_90k.yaml +++ /dev/null @@ -1,15 +0,0 @@ -_BASE_: ../maskformer2_R50_bs16_90k.yaml -MODEL: - BACKBONE: - NAME: "D2SwinTransformer" - SWIN: - EMBED_DIM: 96 - DEPTHS: [2, 2, 6, 2] - NUM_HEADS: [3, 6, 12, 24] - WINDOW_SIZE: 7 - APE: False - DROP_PATH_RATE: 0.3 - PATCH_NORM: True - WEIGHTS: "swin_tiny_patch4_window7_224.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] diff --git a/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/Base-Cityscapes-PanopticSegmentation.yaml b/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/Base-Cityscapes-PanopticSegmentation.yaml deleted file mode 100644 index 022567c..0000000 --- a/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/Base-Cityscapes-PanopticSegmentation.yaml +++ /dev/null @@ -1,61 +0,0 @@ -MODEL: - BACKBONE: - FREEZE_AT: 0 - NAME: "build_resnet_backbone" - WEIGHTS: "detectron2://ImageNetPretrained/torchvision/R-50.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] - RESNETS: - DEPTH: 50 - STEM_TYPE: "basic" # not used - STEM_OUT_CHANNELS: 64 - STRIDE_IN_1X1: False - OUT_FEATURES: ["res2", "res3", "res4", "res5"] - NORM: "SyncBN" # use syncbn for cityscapes dataset - RES5_MULTI_GRID: [1, 1, 1] # not used -DATASETS: - TRAIN: ("cityscapes_fine_panoptic_train",) - TEST: ("cityscapes_fine_panoptic_val",) -SOLVER: - IMS_PER_BATCH: 16 - BASE_LR: 0.0001 - MAX_ITER: 90000 - WARMUP_FACTOR: 1.0 - WARMUP_ITERS: 0 - WEIGHT_DECAY: 0.05 - OPTIMIZER: "ADAMW" - LR_SCHEDULER_NAME: "WarmupPolyLR" - BACKBONE_MULTIPLIER: 0.1 - CLIP_GRADIENTS: - ENABLED: True - CLIP_TYPE: "full_model" - CLIP_VALUE: 0.01 - NORM_TYPE: 2.0 - AMP: - ENABLED: True -INPUT: - MIN_SIZE_TRAIN: !!python/object/apply:eval ["[int(x * 0.1 * 1024) for x in range(5, 21)]"] - MIN_SIZE_TRAIN_SAMPLING: "choice" - MIN_SIZE_TEST: 1024 - MAX_SIZE_TRAIN: 4096 - MAX_SIZE_TEST: 2048 - CROP: - ENABLED: True - TYPE: "absolute" - SIZE: (512, 1024) - SINGLE_CATEGORY_MAX_AREA: 1.0 - COLOR_AUG_SSD: True - SIZE_DIVISIBILITY: -1 - FORMAT: "RGB" - DATASET_MAPPER_NAME: "mask_former_panoptic" -TEST: - EVAL_PERIOD: 5000 - AUG: - ENABLED: False - MIN_SIZES: [512, 768, 1024, 1280, 1536, 1792] - MAX_SIZE: 4096 - FLIP: True -DATALOADER: - FILTER_EMPTY_ANNOTATIONS: True - NUM_WORKERS: 4 -VERSION: 2 diff --git a/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/maskformer2_R101_bs16_90k.yaml b/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/maskformer2_R101_bs16_90k.yaml deleted file mode 100644 index 1eb38da..0000000 --- a/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/maskformer2_R101_bs16_90k.yaml +++ /dev/null @@ -1,11 +0,0 @@ -_BASE_: maskformer2_R50_bs16_90k.yaml -MODEL: - WEIGHTS: "R-101.pkl" - RESNETS: - DEPTH: 101 - STEM_TYPE: "basic" # not used - STEM_OUT_CHANNELS: 64 - STRIDE_IN_1X1: False - OUT_FEATURES: ["res2", "res3", "res4", "res5"] - NORM: "SyncBN" - RES5_MULTI_GRID: [1, 1, 1] # not used diff --git a/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/maskformer2_R50_bs16_90k.yaml b/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/maskformer2_R50_bs16_90k.yaml deleted file mode 100644 index 3c2d679..0000000 --- a/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/maskformer2_R50_bs16_90k.yaml +++ /dev/null @@ -1,44 +0,0 @@ -_BASE_: Base-Cityscapes-PanopticSegmentation.yaml -MODEL: - META_ARCHITECTURE: "MaskFormer" - SEM_SEG_HEAD: - NAME: "MaskFormerHead" - IGNORE_VALUE: 255 - NUM_CLASSES: 19 - LOSS_WEIGHT: 1.0 - CONVS_DIM: 256 - MASK_DIM: 256 - NORM: "GN" - # pixel decoder - PIXEL_DECODER_NAME: "MSDeformAttnPixelDecoder" - IN_FEATURES: ["res2", "res3", "res4", "res5"] - DEFORMABLE_TRANSFORMER_ENCODER_IN_FEATURES: ["res3", "res4", "res5"] - COMMON_STRIDE: 4 - TRANSFORMER_ENC_LAYERS: 6 - MASK_FORMER: - TRANSFORMER_DECODER_NAME: "MultiScaleMaskedTransformerDecoder" - TRANSFORMER_IN_FEATURE: "multi_scale_pixel_decoder" - DEEP_SUPERVISION: True - NO_OBJECT_WEIGHT: 0.1 - CLASS_WEIGHT: 2.0 - MASK_WEIGHT: 5.0 - DICE_WEIGHT: 5.0 - HIDDEN_DIM: 256 - NUM_OBJECT_QUERIES: 100 - NHEADS: 8 - DROPOUT: 0.0 - DIM_FEEDFORWARD: 2048 - ENC_LAYERS: 0 - PRE_NORM: False - ENFORCE_INPUT_PROJ: False - SIZE_DIVISIBILITY: 32 - DEC_LAYERS: 10 # 9 decoder layers, add one for the loss on learnable query - TRAIN_NUM_POINTS: 12544 - OVERSAMPLE_RATIO: 3.0 - IMPORTANCE_SAMPLE_RATIO: 0.75 - TEST: - SEMANTIC_ON: True - INSTANCE_ON: True - PANOPTIC_ON: True - OVERLAP_THRESHOLD: 0.8 - OBJECT_MASK_THRESHOLD: 0.8 diff --git a/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/swin/maskformer2_swin_base_IN21k_384_bs16_90k.yaml b/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/swin/maskformer2_swin_base_IN21k_384_bs16_90k.yaml deleted file mode 100644 index 2956571..0000000 --- a/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/swin/maskformer2_swin_base_IN21k_384_bs16_90k.yaml +++ /dev/null @@ -1,16 +0,0 @@ -_BASE_: ../maskformer2_R50_bs16_90k.yaml -MODEL: - BACKBONE: - NAME: "D2SwinTransformer" - SWIN: - EMBED_DIM: 128 - DEPTHS: [2, 2, 18, 2] - NUM_HEADS: [4, 8, 16, 32] - WINDOW_SIZE: 12 - APE: False - DROP_PATH_RATE: 0.3 - PATCH_NORM: True - PRETRAIN_IMG_SIZE: 384 - WEIGHTS: "swin_base_patch4_window12_384_22k.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] diff --git a/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_90k.yaml b/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_90k.yaml deleted file mode 100644 index 72860d9..0000000 --- a/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_90k.yaml +++ /dev/null @@ -1,18 +0,0 @@ -_BASE_: ../maskformer2_R50_bs16_90k.yaml -MODEL: - BACKBONE: - NAME: "D2SwinTransformer" - SWIN: - EMBED_DIM: 192 - DEPTHS: [2, 2, 18, 2] - NUM_HEADS: [6, 12, 24, 48] - WINDOW_SIZE: 12 - APE: False - DROP_PATH_RATE: 0.3 - PATCH_NORM: True - PRETRAIN_IMG_SIZE: 384 - WEIGHTS: "swin_large_patch4_window12_384_22k.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] - MASK_FORMER: - NUM_OBJECT_QUERIES: 200 diff --git a/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/swin/maskformer2_swin_small_bs16_90k.yaml b/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/swin/maskformer2_swin_small_bs16_90k.yaml deleted file mode 100644 index 156ef9e..0000000 --- a/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/swin/maskformer2_swin_small_bs16_90k.yaml +++ /dev/null @@ -1,15 +0,0 @@ -_BASE_: ../maskformer2_R50_bs16_90k.yaml -MODEL: - BACKBONE: - NAME: "D2SwinTransformer" - SWIN: - EMBED_DIM: 96 - DEPTHS: [2, 2, 18, 2] - NUM_HEADS: [3, 6, 12, 24] - WINDOW_SIZE: 7 - APE: False - DROP_PATH_RATE: 0.3 - PATCH_NORM: True - WEIGHTS: "swin_small_patch4_window7_224.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] diff --git a/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/swin/maskformer2_swin_tiny_bs16_90k.yaml b/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/swin/maskformer2_swin_tiny_bs16_90k.yaml deleted file mode 100644 index 0c56e2c..0000000 --- a/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/swin/maskformer2_swin_tiny_bs16_90k.yaml +++ /dev/null @@ -1,15 +0,0 @@ -_BASE_: ../maskformer2_R50_bs16_90k.yaml -MODEL: - BACKBONE: - NAME: "D2SwinTransformer" - SWIN: - EMBED_DIM: 96 - DEPTHS: [2, 2, 6, 2] - NUM_HEADS: [3, 6, 12, 24] - WINDOW_SIZE: 7 - APE: False - DROP_PATH_RATE: 0.3 - PATCH_NORM: True - WEIGHTS: "swin_tiny_patch4_window7_224.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] diff --git a/SeMask-Mask2Former/configs/coco/instance-segmentation/Base-COCO-InstanceSegmentation.yaml b/SeMask-Mask2Former/configs/coco/instance-segmentation/Base-COCO-InstanceSegmentation.yaml deleted file mode 100644 index 98943d9..0000000 --- a/SeMask-Mask2Former/configs/coco/instance-segmentation/Base-COCO-InstanceSegmentation.yaml +++ /dev/null @@ -1,47 +0,0 @@ -MODEL: - BACKBONE: - FREEZE_AT: 0 - NAME: "build_resnet_backbone" - WEIGHTS: "detectron2://ImageNetPretrained/torchvision/R-50.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] - RESNETS: - DEPTH: 50 - STEM_TYPE: "basic" # not used - STEM_OUT_CHANNELS: 64 - STRIDE_IN_1X1: False - OUT_FEATURES: ["res2", "res3", "res4", "res5"] - # NORM: "SyncBN" - RES5_MULTI_GRID: [1, 1, 1] # not used -DATASETS: - TRAIN: ("coco_2017_train",) - TEST: ("coco_2017_val",) -SOLVER: - IMS_PER_BATCH: 16 - BASE_LR: 0.0001 - STEPS: (327778, 355092) - MAX_ITER: 368750 - WARMUP_FACTOR: 1.0 - WARMUP_ITERS: 10 - WEIGHT_DECAY: 0.05 - OPTIMIZER: "ADAMW" - BACKBONE_MULTIPLIER: 0.1 - CLIP_GRADIENTS: - ENABLED: True - CLIP_TYPE: "full_model" - CLIP_VALUE: 0.01 - NORM_TYPE: 2.0 - AMP: - ENABLED: True -INPUT: - IMAGE_SIZE: 1024 - MIN_SCALE: 0.1 - MAX_SCALE: 2.0 - FORMAT: "RGB" - DATASET_MAPPER_NAME: "coco_instance_lsj" -TEST: - EVAL_PERIOD: 5000 -DATALOADER: - FILTER_EMPTY_ANNOTATIONS: True - NUM_WORKERS: 4 -VERSION: 2 diff --git a/SeMask-Mask2Former/configs/coco/instance-segmentation/maskformer2_R101_bs16_50ep.yaml b/SeMask-Mask2Former/configs/coco/instance-segmentation/maskformer2_R101_bs16_50ep.yaml deleted file mode 100644 index 77defd0..0000000 --- a/SeMask-Mask2Former/configs/coco/instance-segmentation/maskformer2_R101_bs16_50ep.yaml +++ /dev/null @@ -1,11 +0,0 @@ -_BASE_: maskformer2_R50_bs16_50ep.yaml -MODEL: - WEIGHTS: "R-101.pkl" - RESNETS: - DEPTH: 101 - STEM_TYPE: "basic" # not used - STEM_OUT_CHANNELS: 64 - STRIDE_IN_1X1: False - OUT_FEATURES: ["res2", "res3", "res4", "res5"] - # NORM: "SyncBN" - RES5_MULTI_GRID: [1, 1, 1] # not used diff --git a/SeMask-Mask2Former/configs/coco/instance-segmentation/maskformer2_R50_bs16_50ep.yaml b/SeMask-Mask2Former/configs/coco/instance-segmentation/maskformer2_R50_bs16_50ep.yaml deleted file mode 100644 index 4b9e76e..0000000 --- a/SeMask-Mask2Former/configs/coco/instance-segmentation/maskformer2_R50_bs16_50ep.yaml +++ /dev/null @@ -1,44 +0,0 @@ -_BASE_: Base-COCO-InstanceSegmentation.yaml -MODEL: - META_ARCHITECTURE: "MaskFormer" - SEM_SEG_HEAD: - NAME: "MaskFormerHead" - IGNORE_VALUE: 255 - NUM_CLASSES: 80 - LOSS_WEIGHT: 1.0 - CONVS_DIM: 256 - MASK_DIM: 256 - NORM: "GN" - # pixel decoder - PIXEL_DECODER_NAME: "MSDeformAttnPixelDecoder" - IN_FEATURES: ["res2", "res3", "res4", "res5"] - DEFORMABLE_TRANSFORMER_ENCODER_IN_FEATURES: ["res3", "res4", "res5"] - COMMON_STRIDE: 4 - TRANSFORMER_ENC_LAYERS: 6 - MASK_FORMER: - TRANSFORMER_DECODER_NAME: "MultiScaleMaskedTransformerDecoder" - TRANSFORMER_IN_FEATURE: "multi_scale_pixel_decoder" - DEEP_SUPERVISION: True - NO_OBJECT_WEIGHT: 0.1 - CLASS_WEIGHT: 2.0 - MASK_WEIGHT: 5.0 - DICE_WEIGHT: 5.0 - HIDDEN_DIM: 256 - NUM_OBJECT_QUERIES: 100 - NHEADS: 8 - DROPOUT: 0.0 - DIM_FEEDFORWARD: 2048 - ENC_LAYERS: 0 - PRE_NORM: False - ENFORCE_INPUT_PROJ: False - SIZE_DIVISIBILITY: 32 - DEC_LAYERS: 10 # 9 decoder layers, add one for the loss on learnable query - TRAIN_NUM_POINTS: 12544 - OVERSAMPLE_RATIO: 3.0 - IMPORTANCE_SAMPLE_RATIO: 0.75 - TEST: - SEMANTIC_ON: False - INSTANCE_ON: True - PANOPTIC_ON: False - OVERLAP_THRESHOLD: 0.8 - OBJECT_MASK_THRESHOLD: 0.8 diff --git a/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_base_384_bs16_50ep.yaml b/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_base_384_bs16_50ep.yaml deleted file mode 100644 index 4732999..0000000 --- a/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_base_384_bs16_50ep.yaml +++ /dev/null @@ -1,16 +0,0 @@ -_BASE_: ../maskformer2_R50_bs16_50ep.yaml -MODEL: - BACKBONE: - NAME: "D2SwinTransformer" - SWIN: - EMBED_DIM: 128 - DEPTHS: [2, 2, 18, 2] - NUM_HEADS: [4, 8, 16, 32] - WINDOW_SIZE: 12 - APE: False - DROP_PATH_RATE: 0.3 - PATCH_NORM: True - PRETRAIN_IMG_SIZE: 384 - WEIGHTS: "swin_base_patch4_window12_384.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] diff --git a/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_base_IN21k_384_bs16_50ep.yaml b/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_base_IN21k_384_bs16_50ep.yaml deleted file mode 100644 index 5dde960..0000000 --- a/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_base_IN21k_384_bs16_50ep.yaml +++ /dev/null @@ -1,16 +0,0 @@ -_BASE_: ../maskformer2_R50_bs16_50ep.yaml -MODEL: - BACKBONE: - NAME: "D2SwinTransformer" - SWIN: - EMBED_DIM: 128 - DEPTHS: [2, 2, 18, 2] - NUM_HEADS: [4, 8, 16, 32] - WINDOW_SIZE: 12 - APE: False - DROP_PATH_RATE: 0.3 - PATCH_NORM: True - PRETRAIN_IMG_SIZE: 384 - WEIGHTS: "swin_base_patch4_window12_384_22k.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] diff --git a/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_100ep.yaml b/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_100ep.yaml deleted file mode 100644 index b685cdb..0000000 --- a/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_100ep.yaml +++ /dev/null @@ -1,21 +0,0 @@ -_BASE_: ../maskformer2_R50_bs16_50ep.yaml -MODEL: - BACKBONE: - NAME: "D2SwinTransformer" - SWIN: - EMBED_DIM: 192 - DEPTHS: [2, 2, 18, 2] - NUM_HEADS: [6, 12, 24, 48] - WINDOW_SIZE: 12 - APE: False - DROP_PATH_RATE: 0.3 - PATCH_NORM: True - PRETRAIN_IMG_SIZE: 384 - WEIGHTS: "swin_large_patch4_window12_384_22k.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] - MASK_FORMER: - NUM_OBJECT_QUERIES: 200 -SOLVER: - STEPS: (655556, 710184) - MAX_ITER: 737500 diff --git a/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_small_bs16_50ep.yaml b/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_small_bs16_50ep.yaml deleted file mode 100644 index f9b1c56..0000000 --- a/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_small_bs16_50ep.yaml +++ /dev/null @@ -1,15 +0,0 @@ -_BASE_: ../maskformer2_R50_bs16_50ep.yaml -MODEL: - BACKBONE: - NAME: "D2SwinTransformer" - SWIN: - EMBED_DIM: 96 - DEPTHS: [2, 2, 18, 2] - NUM_HEADS: [3, 6, 12, 24] - WINDOW_SIZE: 7 - APE: False - DROP_PATH_RATE: 0.3 - PATCH_NORM: True - WEIGHTS: "swin_small_patch4_window7_224.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] diff --git a/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_tiny_bs16_50ep.yaml b/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_tiny_bs16_50ep.yaml deleted file mode 100644 index 7f27bc5..0000000 --- a/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_tiny_bs16_50ep.yaml +++ /dev/null @@ -1,15 +0,0 @@ -_BASE_: ../maskformer2_R50_bs16_50ep.yaml -MODEL: - BACKBONE: - NAME: "D2SwinTransformer" - SWIN: - EMBED_DIM: 96 - DEPTHS: [2, 2, 6, 2] - NUM_HEADS: [3, 6, 12, 24] - WINDOW_SIZE: 7 - APE: False - DROP_PATH_RATE: 0.3 - PATCH_NORM: True - WEIGHTS: "swin_tiny_patch4_window7_224.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] diff --git a/SeMask-Mask2Former/configs/coco/panoptic-segmentation/Base-COCO-PanopticSegmentation.yaml b/SeMask-Mask2Former/configs/coco/panoptic-segmentation/Base-COCO-PanopticSegmentation.yaml deleted file mode 100644 index 7560a73..0000000 --- a/SeMask-Mask2Former/configs/coco/panoptic-segmentation/Base-COCO-PanopticSegmentation.yaml +++ /dev/null @@ -1,47 +0,0 @@ -MODEL: - BACKBONE: - FREEZE_AT: 0 - NAME: "build_resnet_backbone" - WEIGHTS: "detectron2://ImageNetPretrained/torchvision/R-50.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] - RESNETS: - DEPTH: 50 - STEM_TYPE: "basic" # not used - STEM_OUT_CHANNELS: 64 - STRIDE_IN_1X1: False - OUT_FEATURES: ["res2", "res3", "res4", "res5"] - # NORM: "SyncBN" - RES5_MULTI_GRID: [1, 1, 1] # not used -DATASETS: - TRAIN: ("coco_2017_train_panoptic",) - TEST: ("coco_2017_val_panoptic_with_sem_seg",) # to evaluate instance and semantic performance as well -SOLVER: - IMS_PER_BATCH: 16 - BASE_LR: 0.0001 - STEPS: (327778, 355092) - MAX_ITER: 368750 - WARMUP_FACTOR: 1.0 - WARMUP_ITERS: 10 - WEIGHT_DECAY: 0.05 - OPTIMIZER: "ADAMW" - BACKBONE_MULTIPLIER: 0.1 - CLIP_GRADIENTS: - ENABLED: True - CLIP_TYPE: "full_model" - CLIP_VALUE: 0.01 - NORM_TYPE: 2.0 - AMP: - ENABLED: True -INPUT: - IMAGE_SIZE: 1024 - MIN_SCALE: 0.1 - MAX_SCALE: 2.0 - FORMAT: "RGB" - DATASET_MAPPER_NAME: "coco_panoptic_lsj" -TEST: - EVAL_PERIOD: 5000 -DATALOADER: - FILTER_EMPTY_ANNOTATIONS: True - NUM_WORKERS: 4 -VERSION: 2 diff --git a/SeMask-Mask2Former/configs/coco/panoptic-segmentation/maskformer2_R101_bs16_50ep.yaml b/SeMask-Mask2Former/configs/coco/panoptic-segmentation/maskformer2_R101_bs16_50ep.yaml deleted file mode 100644 index 77defd0..0000000 --- a/SeMask-Mask2Former/configs/coco/panoptic-segmentation/maskformer2_R101_bs16_50ep.yaml +++ /dev/null @@ -1,11 +0,0 @@ -_BASE_: maskformer2_R50_bs16_50ep.yaml -MODEL: - WEIGHTS: "R-101.pkl" - RESNETS: - DEPTH: 101 - STEM_TYPE: "basic" # not used - STEM_OUT_CHANNELS: 64 - STRIDE_IN_1X1: False - OUT_FEATURES: ["res2", "res3", "res4", "res5"] - # NORM: "SyncBN" - RES5_MULTI_GRID: [1, 1, 1] # not used diff --git a/SeMask-Mask2Former/configs/coco/panoptic-segmentation/maskformer2_R50_bs16_50ep.yaml b/SeMask-Mask2Former/configs/coco/panoptic-segmentation/maskformer2_R50_bs16_50ep.yaml deleted file mode 100644 index 9ebf4f1..0000000 --- a/SeMask-Mask2Former/configs/coco/panoptic-segmentation/maskformer2_R50_bs16_50ep.yaml +++ /dev/null @@ -1,45 +0,0 @@ -_BASE_: Base-COCO-PanopticSegmentation.yaml -MODEL: - META_ARCHITECTURE: "MaskFormer" - SEM_SEG_HEAD: - NAME: "MaskFormerHead" - IN_FEATURES: ["res2", "res3", "res4", "res5"] - IGNORE_VALUE: 255 - NUM_CLASSES: 133 - LOSS_WEIGHT: 1.0 - CONVS_DIM: 256 - MASK_DIM: 256 - NORM: "GN" - # pixel decoder - PIXEL_DECODER_NAME: "MSDeformAttnPixelDecoder" - IN_FEATURES: ["res2", "res3", "res4", "res5"] - DEFORMABLE_TRANSFORMER_ENCODER_IN_FEATURES: ["res3", "res4", "res5"] - COMMON_STRIDE: 4 - TRANSFORMER_ENC_LAYERS: 6 - MASK_FORMER: - TRANSFORMER_DECODER_NAME: "MultiScaleMaskedTransformerDecoder" - TRANSFORMER_IN_FEATURE: "multi_scale_pixel_decoder" - DEEP_SUPERVISION: True - NO_OBJECT_WEIGHT: 0.1 - CLASS_WEIGHT: 2.0 - MASK_WEIGHT: 5.0 - DICE_WEIGHT: 5.0 - HIDDEN_DIM: 256 - NUM_OBJECT_QUERIES: 100 - NHEADS: 8 - DROPOUT: 0.0 - DIM_FEEDFORWARD: 2048 - ENC_LAYERS: 0 - PRE_NORM: False - ENFORCE_INPUT_PROJ: False - SIZE_DIVISIBILITY: 32 - DEC_LAYERS: 10 # 9 decoder layers, add one for the loss on learnable query - TRAIN_NUM_POINTS: 12544 - OVERSAMPLE_RATIO: 3.0 - IMPORTANCE_SAMPLE_RATIO: 0.75 - TEST: - SEMANTIC_ON: True - INSTANCE_ON: True - PANOPTIC_ON: True - OVERLAP_THRESHOLD: 0.8 - OBJECT_MASK_THRESHOLD: 0.8 diff --git a/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_base_384_bs16_50ep.yaml b/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_base_384_bs16_50ep.yaml deleted file mode 100644 index 4732999..0000000 --- a/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_base_384_bs16_50ep.yaml +++ /dev/null @@ -1,16 +0,0 @@ -_BASE_: ../maskformer2_R50_bs16_50ep.yaml -MODEL: - BACKBONE: - NAME: "D2SwinTransformer" - SWIN: - EMBED_DIM: 128 - DEPTHS: [2, 2, 18, 2] - NUM_HEADS: [4, 8, 16, 32] - WINDOW_SIZE: 12 - APE: False - DROP_PATH_RATE: 0.3 - PATCH_NORM: True - PRETRAIN_IMG_SIZE: 384 - WEIGHTS: "swin_base_patch4_window12_384.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] diff --git a/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_base_IN21k_384_bs16_50ep.yaml b/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_base_IN21k_384_bs16_50ep.yaml deleted file mode 100644 index 5dde960..0000000 --- a/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_base_IN21k_384_bs16_50ep.yaml +++ /dev/null @@ -1,16 +0,0 @@ -_BASE_: ../maskformer2_R50_bs16_50ep.yaml -MODEL: - BACKBONE: - NAME: "D2SwinTransformer" - SWIN: - EMBED_DIM: 128 - DEPTHS: [2, 2, 18, 2] - NUM_HEADS: [4, 8, 16, 32] - WINDOW_SIZE: 12 - APE: False - DROP_PATH_RATE: 0.3 - PATCH_NORM: True - PRETRAIN_IMG_SIZE: 384 - WEIGHTS: "swin_base_patch4_window12_384_22k.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] diff --git a/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_100ep.yaml b/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_100ep.yaml deleted file mode 100644 index b685cdb..0000000 --- a/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_100ep.yaml +++ /dev/null @@ -1,21 +0,0 @@ -_BASE_: ../maskformer2_R50_bs16_50ep.yaml -MODEL: - BACKBONE: - NAME: "D2SwinTransformer" - SWIN: - EMBED_DIM: 192 - DEPTHS: [2, 2, 18, 2] - NUM_HEADS: [6, 12, 24, 48] - WINDOW_SIZE: 12 - APE: False - DROP_PATH_RATE: 0.3 - PATCH_NORM: True - PRETRAIN_IMG_SIZE: 384 - WEIGHTS: "swin_large_patch4_window12_384_22k.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] - MASK_FORMER: - NUM_OBJECT_QUERIES: 200 -SOLVER: - STEPS: (655556, 710184) - MAX_ITER: 737500 diff --git a/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_small_bs16_50ep.yaml b/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_small_bs16_50ep.yaml deleted file mode 100644 index f9b1c56..0000000 --- a/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_small_bs16_50ep.yaml +++ /dev/null @@ -1,15 +0,0 @@ -_BASE_: ../maskformer2_R50_bs16_50ep.yaml -MODEL: - BACKBONE: - NAME: "D2SwinTransformer" - SWIN: - EMBED_DIM: 96 - DEPTHS: [2, 2, 18, 2] - NUM_HEADS: [3, 6, 12, 24] - WINDOW_SIZE: 7 - APE: False - DROP_PATH_RATE: 0.3 - PATCH_NORM: True - WEIGHTS: "swin_small_patch4_window7_224.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] diff --git a/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_tiny_bs16_50ep.yaml b/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_tiny_bs16_50ep.yaml deleted file mode 100644 index 7f27bc5..0000000 --- a/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_tiny_bs16_50ep.yaml +++ /dev/null @@ -1,15 +0,0 @@ -_BASE_: ../maskformer2_R50_bs16_50ep.yaml -MODEL: - BACKBONE: - NAME: "D2SwinTransformer" - SWIN: - EMBED_DIM: 96 - DEPTHS: [2, 2, 6, 2] - NUM_HEADS: [3, 6, 12, 24] - WINDOW_SIZE: 7 - APE: False - DROP_PATH_RATE: 0.3 - PATCH_NORM: True - WEIGHTS: "swin_tiny_patch4_window7_224.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] diff --git a/SeMask-Mask2Former/configs/mapillary-vistas/panoptic-segmentation/Base-MapillaryVistas-PanopticSegmentation.yaml b/SeMask-Mask2Former/configs/mapillary-vistas/panoptic-segmentation/Base-MapillaryVistas-PanopticSegmentation.yaml deleted file mode 100644 index 86629a3..0000000 --- a/SeMask-Mask2Former/configs/mapillary-vistas/panoptic-segmentation/Base-MapillaryVistas-PanopticSegmentation.yaml +++ /dev/null @@ -1,56 +0,0 @@ -MODEL: - BACKBONE: - FREEZE_AT: 0 - NAME: "build_resnet_backbone" - WEIGHTS: "detectron2://ImageNetPretrained/torchvision/R-50.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] - RESNETS: - DEPTH: 50 - STEM_TYPE: "basic" # not used - STEM_OUT_CHANNELS: 64 - STRIDE_IN_1X1: False - OUT_FEATURES: ["res2", "res3", "res4", "res5"] - # NORM: "SyncBN" - RES5_MULTI_GRID: [1, 1, 1] # not used -DATASETS: - TRAIN: ("mapillary_vistas_panoptic_train",) - TEST: ("mapillary_vistas_panoptic_val",) -SOLVER: - IMS_PER_BATCH: 16 - BASE_LR: 0.0001 - MAX_ITER: 300000 - WARMUP_FACTOR: 1.0 - WARMUP_ITERS: 0 - WEIGHT_DECAY: 0.05 - OPTIMIZER: "ADAMW" - LR_SCHEDULER_NAME: "WarmupPolyLR" - BACKBONE_MULTIPLIER: 0.1 - CLIP_GRADIENTS: - ENABLED: True - CLIP_TYPE: "full_model" - CLIP_VALUE: 0.01 - NORM_TYPE: 2.0 - AMP: - ENABLED: True -INPUT: - MIN_SIZE_TRAIN: !!python/object/apply:eval ["[int(x * 0.1 * 2048) for x in range(5, 21)]"] - MIN_SIZE_TRAIN_SAMPLING: "choice" - MIN_SIZE_TEST: 2048 - MAX_SIZE_TRAIN: 8192 - MAX_SIZE_TEST: 2048 - CROP: - ENABLED: True - TYPE: "absolute" - SIZE: (1024, 1024) - SINGLE_CATEGORY_MAX_AREA: 1.0 - COLOR_AUG_SSD: True - SIZE_DIVISIBILITY: 1024 # used in dataset mapper - FORMAT: "RGB" - DATASET_MAPPER_NAME: "mask_former_panoptic" -TEST: - EVAL_PERIOD: 0 -DATALOADER: - FILTER_EMPTY_ANNOTATIONS: True - NUM_WORKERS: 10 -VERSION: 2 diff --git a/SeMask-Mask2Former/configs/mapillary-vistas/panoptic-segmentation/maskformer_R50_bs16_300k.yaml b/SeMask-Mask2Former/configs/mapillary-vistas/panoptic-segmentation/maskformer_R50_bs16_300k.yaml deleted file mode 100644 index d6a0eaa..0000000 --- a/SeMask-Mask2Former/configs/mapillary-vistas/panoptic-segmentation/maskformer_R50_bs16_300k.yaml +++ /dev/null @@ -1,44 +0,0 @@ -_BASE_: Base-MapillaryVistas-SemanticSegmentation.yaml -MODEL: - META_ARCHITECTURE: "MaskFormer" - SEM_SEG_HEAD: - NAME: "MaskFormerHead" - IGNORE_VALUE: 65 - NUM_CLASSES: 65 - LOSS_WEIGHT: 1.0 - CONVS_DIM: 256 - MASK_DIM: 256 - NORM: "GN" - # pixel decoder - PIXEL_DECODER_NAME: "MSDeformAttnPixelDecoder" - IN_FEATURES: ["res2", "res3", "res4", "res5"] - DEFORMABLE_TRANSFORMER_ENCODER_IN_FEATURES: ["res3", "res4", "res5"] - COMMON_STRIDE: 4 - TRANSFORMER_ENC_LAYERS: 6 - MASK_FORMER: - TRANSFORMER_DECODER_NAME: "MultiScaleMaskedTransformerDecoder" - TRANSFORMER_IN_FEATURE: "multi_scale_pixel_decoder" - DEEP_SUPERVISION: True - NO_OBJECT_WEIGHT: 0.1 - CLASS_WEIGHT: 2.0 - MASK_WEIGHT: 5.0 - DICE_WEIGHT: 5.0 - HIDDEN_DIM: 256 - NUM_OBJECT_QUERIES: 100 - NHEADS: 8 - DROPOUT: 0.0 - DIM_FEEDFORWARD: 2048 - ENC_LAYERS: 0 - PRE_NORM: False - ENFORCE_INPUT_PROJ: False - SIZE_DIVISIBILITY: 32 - DEC_LAYERS: 10 # 9 decoder layers, add one for the loss on learnable query - TRAIN_NUM_POINTS: 12544 - OVERSAMPLE_RATIO: 3.0 - IMPORTANCE_SAMPLE_RATIO: 0.75 - TEST: - SEMANTIC_ON: True - INSTANCE_ON: False - PANOPTIC_ON: True - OVERLAP_THRESHOLD: 0.8 - OBJECT_MASK_THRESHOLD: 0.0 diff --git a/SeMask-Mask2Former/configs/mapillary-vistas/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_300k.yaml b/SeMask-Mask2Former/configs/mapillary-vistas/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_300k.yaml deleted file mode 100644 index e7a8c4c..0000000 --- a/SeMask-Mask2Former/configs/mapillary-vistas/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_300k.yaml +++ /dev/null @@ -1,18 +0,0 @@ -_BASE_: ../maskformer_R50_bs16_300k.yaml -MODEL: - BACKBONE: - NAME: "D2SwinTransformer" - SWIN: - EMBED_DIM: 192 - DEPTHS: [2, 2, 18, 2] - NUM_HEADS: [6, 12, 24, 48] - WINDOW_SIZE: 12 - APE: False - DROP_PATH_RATE: 0.3 - PATCH_NORM: True - PRETRAIN_IMG_SIZE: 384 - WEIGHTS: "swin_large_patch4_window12_384_22k.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] - MASK_FORMER: - NUM_OBJECT_QUERIES: 200 diff --git a/SeMask-Mask2Former/configs/mapillary-vistas/semantic-segmentation/Base-MapillaryVistas-SemanticSegmentation.yaml b/SeMask-Mask2Former/configs/mapillary-vistas/semantic-segmentation/Base-MapillaryVistas-SemanticSegmentation.yaml deleted file mode 100644 index f05fb28..0000000 --- a/SeMask-Mask2Former/configs/mapillary-vistas/semantic-segmentation/Base-MapillaryVistas-SemanticSegmentation.yaml +++ /dev/null @@ -1,56 +0,0 @@ -MODEL: - BACKBONE: - FREEZE_AT: 0 - NAME: "build_resnet_backbone" - WEIGHTS: "detectron2://ImageNetPretrained/torchvision/R-50.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] - RESNETS: - DEPTH: 50 - STEM_TYPE: "basic" # not used - STEM_OUT_CHANNELS: 64 - STRIDE_IN_1X1: False - OUT_FEATURES: ["res2", "res3", "res4", "res5"] - # NORM: "SyncBN" - RES5_MULTI_GRID: [1, 1, 1] # not used -DATASETS: - TRAIN: ("mapillary_vistas_sem_seg_train",) - TEST: ("mapillary_vistas_sem_seg_val",) -SOLVER: - IMS_PER_BATCH: 16 - BASE_LR: 0.0001 - MAX_ITER: 300000 - WARMUP_FACTOR: 1.0 - WARMUP_ITERS: 0 - WEIGHT_DECAY: 0.05 - OPTIMIZER: "ADAMW" - LR_SCHEDULER_NAME: "WarmupPolyLR" - BACKBONE_MULTIPLIER: 0.1 - CLIP_GRADIENTS: - ENABLED: True - CLIP_TYPE: "full_model" - CLIP_VALUE: 0.01 - NORM_TYPE: 2.0 - AMP: - ENABLED: True -INPUT: - MIN_SIZE_TRAIN: !!python/object/apply:eval ["[int(x * 0.1 * 2048) for x in range(5, 21)]"] - MIN_SIZE_TRAIN_SAMPLING: "choice" - MIN_SIZE_TEST: 2048 - MAX_SIZE_TRAIN: 8192 - MAX_SIZE_TEST: 2048 - CROP: - ENABLED: True - TYPE: "absolute" - SIZE: (1024, 1024) - SINGLE_CATEGORY_MAX_AREA: 1.0 - COLOR_AUG_SSD: True - SIZE_DIVISIBILITY: 1024 # used in dataset mapper - FORMAT: "RGB" - DATASET_MAPPER_NAME: "mask_former_semantic" -TEST: - EVAL_PERIOD: 0 -DATALOADER: - FILTER_EMPTY_ANNOTATIONS: True - NUM_WORKERS: 10 -VERSION: 2 diff --git a/SeMask-Mask2Former/configs/mapillary-vistas/semantic-segmentation/maskformer2_R50_bs16_300k.yaml b/SeMask-Mask2Former/configs/mapillary-vistas/semantic-segmentation/maskformer2_R50_bs16_300k.yaml deleted file mode 100644 index e9977a1..0000000 --- a/SeMask-Mask2Former/configs/mapillary-vistas/semantic-segmentation/maskformer2_R50_bs16_300k.yaml +++ /dev/null @@ -1,44 +0,0 @@ -_BASE_: Base-MapillaryVistas-SemanticSegmentation.yaml -MODEL: - META_ARCHITECTURE: "MaskFormer" - SEM_SEG_HEAD: - NAME: "MaskFormerHead" - IGNORE_VALUE: 65 - NUM_CLASSES: 65 - LOSS_WEIGHT: 1.0 - CONVS_DIM: 256 - MASK_DIM: 256 - NORM: "GN" - # pixel decoder - PIXEL_DECODER_NAME: "MSDeformAttnPixelDecoder" - IN_FEATURES: ["res2", "res3", "res4", "res5"] - DEFORMABLE_TRANSFORMER_ENCODER_IN_FEATURES: ["res3", "res4", "res5"] - COMMON_STRIDE: 4 - TRANSFORMER_ENC_LAYERS: 6 - MASK_FORMER: - TRANSFORMER_DECODER_NAME: "MultiScaleMaskedTransformerDecoder" - TRANSFORMER_IN_FEATURE: "multi_scale_pixel_decoder" - DEEP_SUPERVISION: True - NO_OBJECT_WEIGHT: 0.1 - CLASS_WEIGHT: 2.0 - MASK_WEIGHT: 5.0 - DICE_WEIGHT: 5.0 - HIDDEN_DIM: 256 - NUM_OBJECT_QUERIES: 100 - NHEADS: 8 - DROPOUT: 0.0 - DIM_FEEDFORWARD: 2048 - ENC_LAYERS: 0 - PRE_NORM: False - ENFORCE_INPUT_PROJ: False - SIZE_DIVISIBILITY: 32 - DEC_LAYERS: 10 # 9 decoder layers, add one for the loss on learnable query - TRAIN_NUM_POINTS: 12544 - OVERSAMPLE_RATIO: 3.0 - IMPORTANCE_SAMPLE_RATIO: 0.75 - TEST: - SEMANTIC_ON: True - INSTANCE_ON: False - PANOPTIC_ON: False - OVERLAP_THRESHOLD: 0.8 - OBJECT_MASK_THRESHOLD: 0.0 diff --git a/SeMask-Mask2Former/configs/mapillary-vistas/semantic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_300k.yaml b/SeMask-Mask2Former/configs/mapillary-vistas/semantic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_300k.yaml deleted file mode 100644 index e336a1b..0000000 --- a/SeMask-Mask2Former/configs/mapillary-vistas/semantic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_300k.yaml +++ /dev/null @@ -1,18 +0,0 @@ -_BASE_: ../maskformer2_R50_bs16_300k.yaml -MODEL: - BACKBONE: - NAME: "D2SwinTransformer" - SWIN: - EMBED_DIM: 192 - DEPTHS: [2, 2, 18, 2] - NUM_HEADS: [6, 12, 24, 48] - WINDOW_SIZE: 12 - APE: False - DROP_PATH_RATE: 0.3 - PATCH_NORM: True - PRETRAIN_IMG_SIZE: 384 - WEIGHTS: "swin_large_patch4_window12_384_22k.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] - MASK_FORMER: - NUM_OBJECT_QUERIES: 100 diff --git a/SeMask-MaskFormer/GETTING_STARTED.md b/SeMask-MaskFormer/GETTING_STARTED.md index 828689c..2fc3962 100644 --- a/SeMask-MaskFormer/GETTING_STARTED.md +++ b/SeMask-MaskFormer/GETTING_STARTED.md @@ -9,11 +9,11 @@ Please see [Getting Started with Detectron2](https://github.com/facebookresearch 1. Pick a model and its config file from [model zoo](MODEL_ZOO.md), - for example, `ade20k-150/maskformer_R50_bs16_160k.yaml`. + for example, `ade20k-150/semask_swin/maskformer_semask_swin_large_IN21k_384_bs16_160k_res640.yaml`. 2. We provide `demo.py` that is able to demo builtin configs. Run it with: ``` cd demo/ -python demo.py --config-file ../configs/ade20k-150/maskformer_R50_bs16_160k.yaml \ +python demo.py --config-file ../configs/ade20k-150/semask_swin/maskformer_semask_swin_large_IN21k_384_bs16_160k_res640.yaml \ --input input1.jpg input2.jpg \ [--other-options] --opts MODEL.WEIGHTS /path/to/checkpoint_file @@ -39,7 +39,7 @@ setup the corresponding datasets following then run: ``` ./train_net.py --num-gpus 8 \ - --config-file configs/ade20k-150/maskformer_R50_bs16_160k.yaml + --config-file configs/ade20k-150/semask_swin/maskformer_semask_swin_large_IN21k_384_bs16_160k_res640.yaml ``` The configs are made for 8-GPU training. @@ -47,14 +47,14 @@ Since we use ADAMW optimizer, it is not clear how to scale learning rate with ba To train on 1 GPU, you need to figure out learning rate and batch size by yourself: ``` ./train_net.py \ - --config-file configs/ade20k-150/maskformer_R50_bs16_160k.yaml \ + --config-file configs/ade20k-150/semask_swin/maskformer_semask_swin_large_IN21k_384_bs16_160k_res640.yaml \ --num-gpus 1 SOLVER.IMS_PER_BATCH SET_TO_SOME_REASONABLE_VALUE SOLVER.BASE_LR SET_TO_SOME_REASONABLE_VALUE ``` To evaluate a model's performance, use ``` ./train_net.py \ - --config-file configs/ade20k-150/maskformer_R50_bs16_160k.yaml \ + --config-file configs/ade20k-150/semask_swin/maskformer_semask_swin_large_IN21k_384_bs16_160k_res640.yaml \ --eval-only MODEL.WEIGHTS /path/to/checkpoint_file ``` For more options, see `./train_net.py -h`. diff --git a/SeMask-MaskFormer/configs/ade20k-150-panoptic/maskformer_panoptic_R101_bs16_720k.yaml b/SeMask-MaskFormer/configs/ade20k-150-panoptic/maskformer_panoptic_R101_bs16_720k.yaml deleted file mode 100644 index c280b8f..0000000 --- a/SeMask-MaskFormer/configs/ade20k-150-panoptic/maskformer_panoptic_R101_bs16_720k.yaml +++ /dev/null @@ -1,11 +0,0 @@ -_BASE_: maskformer_panoptic_R50_bs16_720k.yaml -MODEL: - WEIGHTS: "R-101.pkl" - RESNETS: - DEPTH: 101 - STEM_TYPE: "basic" # not used - STEM_OUT_CHANNELS: 64 - STRIDE_IN_1X1: False - OUT_FEATURES: ["res2", "res3", "res4", "res5"] - # NORM: "SyncBN" - RES5_MULTI_GRID: [1, 1, 1] # not used diff --git a/SeMask-MaskFormer/configs/ade20k-150-panoptic/maskformer_panoptic_R50_bs16_720k.yaml b/SeMask-MaskFormer/configs/ade20k-150-panoptic/maskformer_panoptic_R50_bs16_720k.yaml deleted file mode 100644 index 0be2839..0000000 --- a/SeMask-MaskFormer/configs/ade20k-150-panoptic/maskformer_panoptic_R50_bs16_720k.yaml +++ /dev/null @@ -1,33 +0,0 @@ -_BASE_: ../ade20k-150/maskformer_R50_bs16_160k.yaml -MODEL: - SEM_SEG_HEAD: - PIXEL_DECODER_NAME: "TransformerEncoderPixelDecoder" - TRANSFORMER_ENC_LAYERS: 6 - MASK_FORMER: - TRANSFORMER_IN_FEATURE: "transformer_encoder" - TEST: - PANOPTIC_ON: True - OVERLAP_THRESHOLD: 0.8 - OBJECT_MASK_THRESHOLD: 0.7 -DATASETS: - TRAIN: ("ade20k_panoptic_train",) - TEST: ("ade20k_panoptic_val",) -SOLVER: - MAX_ITER: 720000 -INPUT: - MIN_SIZE_TRAIN: !!python/object/apply:eval ["[int(x * 0.1 * 640) for x in range(5, 21)]"] - MIN_SIZE_TRAIN_SAMPLING: "choice" - MIN_SIZE_TEST: 640 - MAX_SIZE_TRAIN: 2560 - MAX_SIZE_TEST: 2560 - CROP: - ENABLED: True - TYPE: "absolute" - SIZE: (640, 640) - SINGLE_CATEGORY_MAX_AREA: 1.0 - COLOR_AUG_SSD: True - SIZE_DIVISIBILITY: 640 # used in dataset mapper - FORMAT: "RGB" - DATASET_MAPPER_NAME: "mask_former_panoptic" -TEST: - EVAL_PERIOD: 0 diff --git a/SeMask-MaskFormer/configs/ade20k-full-847/Base-ADE20KFull-847.yaml b/SeMask-MaskFormer/configs/ade20k-full-847/Base-ADE20KFull-847.yaml deleted file mode 100644 index e3cd338..0000000 --- a/SeMask-MaskFormer/configs/ade20k-full-847/Base-ADE20KFull-847.yaml +++ /dev/null @@ -1,54 +0,0 @@ -MODEL: - BACKBONE: - FREEZE_AT: 0 - NAME: "build_resnet_backbone" - WEIGHTS: "detectron2://ImageNetPretrained/torchvision/R-50.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] - RESNETS: - DEPTH: 50 - STEM_TYPE: "basic" # not used - STEM_OUT_CHANNELS: 64 - STRIDE_IN_1X1: False - OUT_FEATURES: ["res2", "res3", "res4", "res5"] - # NORM: "SyncBN" - RES5_MULTI_GRID: [1, 1, 1] # not used -DATASETS: - TRAIN: ("ade20k_full_sem_seg_train",) - TEST: ("ade20k_full_sem_seg_val",) -SOLVER: - IMS_PER_BATCH: 16 - BASE_LR: 0.0001 - MAX_ITER: 200000 - WARMUP_FACTOR: 1.0 - WARMUP_ITERS: 0 - WEIGHT_DECAY: 0.0001 - OPTIMIZER: "ADAMW" - LR_SCHEDULER_NAME: "WarmupPolyLR" - BACKBONE_MULTIPLIER: 0.1 - CLIP_GRADIENTS: - ENABLED: True - CLIP_TYPE: "full_model" - CLIP_VALUE: 0.01 - NORM_TYPE: 2.0 -INPUT: - MIN_SIZE_TRAIN: !!python/object/apply:eval ["[int(x * 0.1 * 512) for x in range(5, 21)]"] - MIN_SIZE_TRAIN_SAMPLING: "choice" - MIN_SIZE_TEST: 512 - MAX_SIZE_TRAIN: 2048 - MAX_SIZE_TEST: 2048 - CROP: - ENABLED: True - TYPE: "absolute" - SIZE: (512, 512) - SINGLE_CATEGORY_MAX_AREA: 1.0 - COLOR_AUG_SSD: True - SIZE_DIVISIBILITY: 512 # used in dataset mapper - FORMAT: "RGB" - DATASET_MAPPER_NAME: "mask_former_semantic" -TEST: - EVAL_PERIOD: 5000 -DATALOADER: - FILTER_EMPTY_ANNOTATIONS: True - NUM_WORKERS: 4 -VERSION: 2 diff --git a/SeMask-MaskFormer/configs/ade20k-full-847/maskformer_R101_bs16_200k.yaml b/SeMask-MaskFormer/configs/ade20k-full-847/maskformer_R101_bs16_200k.yaml deleted file mode 100644 index 484c437..0000000 --- a/SeMask-MaskFormer/configs/ade20k-full-847/maskformer_R101_bs16_200k.yaml +++ /dev/null @@ -1,11 +0,0 @@ -_BASE_: maskformer_R50_bs16_200k.yaml -MODEL: - WEIGHTS: "R-101.pkl" - RESNETS: - DEPTH: 101 - STEM_TYPE: "basic" # not used - STEM_OUT_CHANNELS: 64 - STRIDE_IN_1X1: False - OUT_FEATURES: ["res2", "res3", "res4", "res5"] - # NORM: "SyncBN" - RES5_MULTI_GRID: [1, 1, 1] # not used diff --git a/SeMask-MaskFormer/configs/ade20k-full-847/maskformer_R101c_bs16_200k.yaml b/SeMask-MaskFormer/configs/ade20k-full-847/maskformer_R101c_bs16_200k.yaml deleted file mode 100644 index 3a802c5..0000000 --- a/SeMask-MaskFormer/configs/ade20k-full-847/maskformer_R101c_bs16_200k.yaml +++ /dev/null @@ -1,13 +0,0 @@ -_BASE_: maskformer_R50_bs16_200k.yaml -MODEL: - BACKBONE: - NAME: "build_resnet_deeplab_backbone" - WEIGHTS: "detectron2://DeepLab/R-103.pkl" - RESNETS: - DEPTH: 101 - STEM_TYPE: "deeplab" - STEM_OUT_CHANNELS: 128 - STRIDE_IN_1X1: False - OUT_FEATURES: ["res2", "res3", "res4", "res5"] - # NORM: "SyncBN" - RES5_MULTI_GRID: [1, 2, 4] diff --git a/SeMask-MaskFormer/configs/ade20k-full-847/maskformer_R50_bs16_200k.yaml b/SeMask-MaskFormer/configs/ade20k-full-847/maskformer_R50_bs16_200k.yaml deleted file mode 100644 index 430adaa..0000000 --- a/SeMask-MaskFormer/configs/ade20k-full-847/maskformer_R50_bs16_200k.yaml +++ /dev/null @@ -1,27 +0,0 @@ -_BASE_: Base-ADE20KFull-847.yaml -MODEL: - META_ARCHITECTURE: "MaskFormer" - SEM_SEG_HEAD: - NAME: "MaskFormerHead" - IN_FEATURES: ["res2", "res3", "res4", "res5"] - IGNORE_VALUE: 65535 - NUM_CLASSES: 847 - COMMON_STRIDE: 4 # not used, hard-coded - LOSS_WEIGHT: 1.0 - CONVS_DIM: 256 - MASK_DIM: 256 - NORM: "GN" - MASK_FORMER: - TRANSFORMER_IN_FEATURE: "res5" - DEEP_SUPERVISION: True - NO_OBJECT_WEIGHT: 0.1 - DICE_WEIGHT: 1.0 - MASK_WEIGHT: 20.0 - HIDDEN_DIM: 256 - NUM_OBJECT_QUERIES: 100 - NHEADS: 8 - DROPOUT: 0.1 - DIM_FEEDFORWARD: 2048 - ENC_LAYERS: 0 - DEC_LAYERS: 6 - PRE_NORM: False diff --git a/SeMask-MaskFormer/configs/ade20k-full-847/per_pixel_baseline_R50_bs16_200k.yaml b/SeMask-MaskFormer/configs/ade20k-full-847/per_pixel_baseline_R50_bs16_200k.yaml deleted file mode 100644 index 8323067..0000000 --- a/SeMask-MaskFormer/configs/ade20k-full-847/per_pixel_baseline_R50_bs16_200k.yaml +++ /dev/null @@ -1,13 +0,0 @@ -_BASE_: Base-ADE20KFull-847.yaml -MODEL: - META_ARCHITECTURE: "SemanticSegmentor" - SEM_SEG_HEAD: - NAME: "PerPixelBaselineHead" - IN_FEATURES: ["res2", "res3", "res4", "res5"] - IGNORE_VALUE: 65535 - NUM_CLASSES: 847 - COMMON_STRIDE: 4 # not used, hard-coded - LOSS_WEIGHT: 1.0 - CONVS_DIM: 256 - MASK_DIM: 256 - NORM: "GN" diff --git a/SeMask-MaskFormer/configs/ade20k-full-847/per_pixel_baseline_plus_R50_bs16_200k.yaml b/SeMask-MaskFormer/configs/ade20k-full-847/per_pixel_baseline_plus_R50_bs16_200k.yaml deleted file mode 100644 index b30d455..0000000 --- a/SeMask-MaskFormer/configs/ade20k-full-847/per_pixel_baseline_plus_R50_bs16_200k.yaml +++ /dev/null @@ -1,24 +0,0 @@ -_BASE_: Base-ADE20KFull-847.yaml -MODEL: - META_ARCHITECTURE: "SemanticSegmentor" - SEM_SEG_HEAD: - NAME: "PerPixelBaselinePlusHead" - IN_FEATURES: ["res2", "res3", "res4", "res5"] - IGNORE_VALUE: 65535 - NUM_CLASSES: 847 - COMMON_STRIDE: 4 # not used, hard-coded - LOSS_WEIGHT: 1.0 - CONVS_DIM: 256 - MASK_DIM: 256 - NORM: "GN" - MASK_FORMER: - TRANSFORMER_IN_FEATURE: "res5" - DEEP_SUPERVISION: True - HIDDEN_DIM: 256 - NUM_OBJECT_QUERIES: 847 # remember to set this to NUM_CLASSES - NHEADS: 8 - DROPOUT: 0.1 - DIM_FEEDFORWARD: 2048 - ENC_LAYERS: 0 - DEC_LAYERS: 6 - PRE_NORM: False diff --git a/SeMask-MaskFormer/configs/cityscapes-19/Base-Cityscapes-19.yaml b/SeMask-MaskFormer/configs/cityscapes-19/Base-Cityscapes-19.yaml deleted file mode 100644 index 6b52542..0000000 --- a/SeMask-MaskFormer/configs/cityscapes-19/Base-Cityscapes-19.yaml +++ /dev/null @@ -1,59 +0,0 @@ -MODEL: - BACKBONE: - FREEZE_AT: 0 - NAME: "build_resnet_backbone" - WEIGHTS: "detectron2://ImageNetPretrained/torchvision/R-50.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] - RESNETS: - DEPTH: 50 - STEM_TYPE: "basic" # not used - STEM_OUT_CHANNELS: 64 - STRIDE_IN_1X1: False - OUT_FEATURES: ["res2", "res3", "res4", "res5"] - # NORM: "SyncBN" - RES5_MULTI_GRID: [1, 1, 1] # not used -DATASETS: - TRAIN: ("cityscapes_fine_sem_seg_train",) - TEST: ("cityscapes_fine_sem_seg_val",) -SOLVER: - IMS_PER_BATCH: 16 - BASE_LR: 0.0001 - MAX_ITER: 90000 - WARMUP_FACTOR: 1.0 - WARMUP_ITERS: 0 - WEIGHT_DECAY: 0.0001 - OPTIMIZER: "ADAMW" - LR_SCHEDULER_NAME: "WarmupPolyLR" - BACKBONE_MULTIPLIER: 0.1 - CLIP_GRADIENTS: - ENABLED: True - CLIP_TYPE: "full_model" - CLIP_VALUE: 0.01 - NORM_TYPE: 2.0 -INPUT: - MIN_SIZE_TRAIN: !!python/object/apply:eval ["[int(x * 0.1 * 1024) for x in range(5, 21)]"] - MIN_SIZE_TRAIN_SAMPLING: "choice" - MIN_SIZE_TEST: 1024 - MAX_SIZE_TRAIN: 4096 - MAX_SIZE_TEST: 2048 - CROP: - ENABLED: True - TYPE: "absolute" - SIZE: (512, 1024) - SINGLE_CATEGORY_MAX_AREA: 1.0 - COLOR_AUG_SSD: True - SIZE_DIVISIBILITY: -1 - FORMAT: "RGB" - DATASET_MAPPER_NAME: "mask_former_semantic" -TEST: - EVAL_PERIOD: 5000 - AUG: - ENABLED: False - MIN_SIZES: [512, 768, 1024, 1280, 1536, 1792] - MAX_SIZE: 4096 - FLIP: True -DATALOADER: - FILTER_EMPTY_ANNOTATIONS: True - NUM_WORKERS: 4 -VERSION: 2 diff --git a/SeMask-MaskFormer/configs/cityscapes-19/maskformer_R101_bs16_90k.yaml b/SeMask-MaskFormer/configs/cityscapes-19/maskformer_R101_bs16_90k.yaml deleted file mode 100644 index e1017a7..0000000 --- a/SeMask-MaskFormer/configs/cityscapes-19/maskformer_R101_bs16_90k.yaml +++ /dev/null @@ -1,36 +0,0 @@ -_BASE_: Base-Cityscapes-19.yaml -MODEL: - WEIGHTS: "R-101.pkl" - RESNETS: - DEPTH: 101 - STEM_TYPE: "basic" # not used - STEM_OUT_CHANNELS: 64 - STRIDE_IN_1X1: False - OUT_FEATURES: ["res2", "res3", "res4", "res5"] - # NORM: "SyncBN" - RES5_MULTI_GRID: [1, 1, 1] # not used - META_ARCHITECTURE: "MaskFormer" - SEM_SEG_HEAD: - NAME: "MaskFormerHead" - IN_FEATURES: ["res2", "res3", "res4", "res5"] - IGNORE_VALUE: 255 - NUM_CLASSES: 19 - COMMON_STRIDE: 4 # not used, hard-coded - LOSS_WEIGHT: 1.0 - CONVS_DIM: 256 - MASK_DIM: 256 - NORM: "GN" - MASK_FORMER: - TRANSFORMER_IN_FEATURE: "res5" - DEEP_SUPERVISION: True - NO_OBJECT_WEIGHT: 0.1 - DICE_WEIGHT: 1.0 - MASK_WEIGHT: 20.0 - HIDDEN_DIM: 256 - NUM_OBJECT_QUERIES: 100 - NHEADS: 8 - DROPOUT: 0.1 - DIM_FEEDFORWARD: 2048 - ENC_LAYERS: 0 - DEC_LAYERS: 6 - PRE_NORM: False diff --git a/SeMask-MaskFormer/configs/cityscapes-19/maskformer_R101c_bs16_90k.yaml b/SeMask-MaskFormer/configs/cityscapes-19/maskformer_R101c_bs16_90k.yaml deleted file mode 100644 index e07bbee..0000000 --- a/SeMask-MaskFormer/configs/cityscapes-19/maskformer_R101c_bs16_90k.yaml +++ /dev/null @@ -1,16 +0,0 @@ -_BASE_: maskformer_R101_bs16_90k.yaml -MODEL: - BACKBONE: - FREEZE_AT: 0 - NAME: "build_resnet_deeplab_backbone" - WEIGHTS: "detectron2://DeepLab/R-103.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] - RESNETS: - DEPTH: 101 - STEM_TYPE: "deeplab" - STEM_OUT_CHANNELS: 128 - STRIDE_IN_1X1: False - OUT_FEATURES: ["res2", "res3", "res4", "res5"] - # NORM: "SyncBN" - RES5_MULTI_GRID: [1, 2, 4] diff --git a/SeMask-MaskFormer/configs/coco-panoptic/Base-COCO-PanopticSegmentation.yaml b/SeMask-MaskFormer/configs/coco-panoptic/Base-COCO-PanopticSegmentation.yaml deleted file mode 100644 index 53f3772..0000000 --- a/SeMask-MaskFormer/configs/coco-panoptic/Base-COCO-PanopticSegmentation.yaml +++ /dev/null @@ -1,47 +0,0 @@ -MODEL: - BACKBONE: - FREEZE_AT: 0 - NAME: "build_resnet_backbone" - WEIGHTS: "detectron2://ImageNetPretrained/torchvision/R-50.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] - RESNETS: - DEPTH: 50 - STEM_TYPE: "basic" # not used - STEM_OUT_CHANNELS: 64 - STRIDE_IN_1X1: False - OUT_FEATURES: ["res2", "res3", "res4", "res5"] - # NORM: "SyncBN" - RES5_MULTI_GRID: [1, 1, 1] # not used -DATASETS: - TRAIN: ("coco_2017_train_panoptic",) - TEST: ("coco_2017_val_panoptic",) -SOLVER: - IMS_PER_BATCH: 64 - BASE_LR: 0.0001 - STEPS: (369600,) - MAX_ITER: 554400 - WARMUP_FACTOR: 1.0 - WARMUP_ITERS: 10 - WEIGHT_DECAY: 0.0001 - OPTIMIZER: "ADAMW" - BACKBONE_MULTIPLIER: 0.1 - CLIP_GRADIENTS: - ENABLED: True - CLIP_TYPE: "full_model" - CLIP_VALUE: 0.01 - NORM_TYPE: 2.0 -INPUT: - MIN_SIZE_TRAIN: (480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800) - CROP: - ENABLED: True - TYPE: "absolute_range" - SIZE: (384, 600) - FORMAT: "RGB" - DATASET_MAPPER_NAME: "detr_panoptic" -TEST: - EVAL_PERIOD: 0 -DATALOADER: - FILTER_EMPTY_ANNOTATIONS: True - NUM_WORKERS: 4 -VERSION: 2 diff --git a/SeMask-MaskFormer/configs/coco-panoptic/maskformer_panoptic_R101_bs64_554k.yaml b/SeMask-MaskFormer/configs/coco-panoptic/maskformer_panoptic_R101_bs64_554k.yaml deleted file mode 100644 index b2bf3b6..0000000 --- a/SeMask-MaskFormer/configs/coco-panoptic/maskformer_panoptic_R101_bs64_554k.yaml +++ /dev/null @@ -1,11 +0,0 @@ -_BASE_: maskformer_panoptic_R50_bs64_554k.yaml -MODEL: - WEIGHTS: "R-101.pkl" - RESNETS: - DEPTH: 101 - STEM_TYPE: "basic" # not used - STEM_OUT_CHANNELS: 64 - STRIDE_IN_1X1: False - OUT_FEATURES: ["res2", "res3", "res4", "res5"] - # NORM: "SyncBN" - RES5_MULTI_GRID: [1, 1, 1] # not used diff --git a/SeMask-MaskFormer/configs/coco-panoptic/maskformer_panoptic_R50_bs64_554k.yaml b/SeMask-MaskFormer/configs/coco-panoptic/maskformer_panoptic_R50_bs64_554k.yaml deleted file mode 100644 index 2375c17..0000000 --- a/SeMask-MaskFormer/configs/coco-panoptic/maskformer_panoptic_R50_bs64_554k.yaml +++ /dev/null @@ -1,36 +0,0 @@ -_BASE_: Base-COCO-PanopticSegmentation.yaml -MODEL: - META_ARCHITECTURE: "MaskFormer" - SEM_SEG_HEAD: - NAME: "MaskFormerHead" - IN_FEATURES: ["res2", "res3", "res4", "res5"] - IGNORE_VALUE: 255 - NUM_CLASSES: 133 - COMMON_STRIDE: 4 # not used, hard-coded - LOSS_WEIGHT: 1.0 - CONVS_DIM: 256 - MASK_DIM: 256 - NORM: "GN" - # add additional 6 encoder layers - PIXEL_DECODER_NAME: "TransformerEncoderPixelDecoder" - TRANSFORMER_ENC_LAYERS: 6 - MASK_FORMER: - TRANSFORMER_IN_FEATURE: "transformer_encoder" - DEEP_SUPERVISION: True - NO_OBJECT_WEIGHT: 0.1 - DICE_WEIGHT: 1.0 - MASK_WEIGHT: 20.0 - HIDDEN_DIM: 256 - NUM_OBJECT_QUERIES: 100 - NHEADS: 8 - DROPOUT: 0.1 - DIM_FEEDFORWARD: 2048 - ENC_LAYERS: 0 - DEC_LAYERS: 6 - PRE_NORM: False - # COCO model should not pad image - SIZE_DIVISIBILITY: 0 - TEST: - PANOPTIC_ON: True - OVERLAP_THRESHOLD: 0.8 - OBJECT_MASK_THRESHOLD: 0.8 diff --git a/SeMask-MaskFormer/configs/coco-panoptic/swin/maskformer_panoptic_swin_base_IN21k_384_bs64_554k.yaml b/SeMask-MaskFormer/configs/coco-panoptic/swin/maskformer_panoptic_swin_base_IN21k_384_bs64_554k.yaml deleted file mode 100644 index 526c74b..0000000 --- a/SeMask-MaskFormer/configs/coco-panoptic/swin/maskformer_panoptic_swin_base_IN21k_384_bs64_554k.yaml +++ /dev/null @@ -1,33 +0,0 @@ -_BASE_: ../maskformer_panoptic_R50_bs64_554k.yaml -MODEL: - BACKBONE: - NAME: "D2SwinTransformer" - SWIN: - EMBED_DIM: 128 - DEPTHS: [2, 2, 18, 2] - NUM_HEADS: [4, 8, 16, 32] - WINDOW_SIZE: 12 - APE: False - DROP_PATH_RATE: 0.3 - PATCH_NORM: True - PRETRAIN_IMG_SIZE: 384 - WEIGHTS: "swin_base_patch4_window12_384_22k.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] - SEM_SEG_HEAD: - PIXEL_DECODER_NAME: "BasePixelDecoder" - MASK_FORMER: - TRANSFORMER_IN_FEATURE: "res5" - ENFORCE_INPUT_PROJ: True - TEST: - PANOPTIC_ON: True - OVERLAP_THRESHOLD: 0.8 - OBJECT_MASK_THRESHOLD: 0.8 -SOLVER: - BASE_LR: 0.00006 - WARMUP_FACTOR: 1e-6 - WARMUP_ITERS: 1500 - WEIGHT_DECAY: 0.01 - WEIGHT_DECAY_NORM: 0.0 - WEIGHT_DECAY_EMBED: 0.0 - BACKBONE_MULTIPLIER: 1.0 \ No newline at end of file diff --git a/SeMask-MaskFormer/configs/coco-panoptic/swin/maskformer_panoptic_swin_large_IN21k_384_bs64_554k.yaml b/SeMask-MaskFormer/configs/coco-panoptic/swin/maskformer_panoptic_swin_large_IN21k_384_bs64_554k.yaml deleted file mode 100644 index a8c8833..0000000 --- a/SeMask-MaskFormer/configs/coco-panoptic/swin/maskformer_panoptic_swin_large_IN21k_384_bs64_554k.yaml +++ /dev/null @@ -1,41 +0,0 @@ -_BASE_: ../maskformer_panoptic_R50_bs64_554k.yaml -MODEL: - BACKBONE: - NAME: "D2SwinTransformer" - SWIN: - EMBED_DIM: 192 - DEPTHS: [2, 2, 18, 2] - NUM_HEADS: [6, 12, 24, 48] - WINDOW_SIZE: 12 - APE: False - DROP_PATH_RATE: 0.3 - PATCH_NORM: True - PRETRAIN_IMG_SIZE: 384 - WEIGHTS: "swin_large_patch4_window12_384_22k.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] - SEM_SEG_HEAD: - PIXEL_DECODER_NAME: "BasePixelDecoder" - MASK_FORMER: - TRANSFORMER_IN_FEATURE: "res5" - ENFORCE_INPUT_PROJ: True - TEST: - PANOPTIC_ON: True - OVERLAP_THRESHOLD: 0.8 - OBJECT_MASK_THRESHOLD: 0.8 -SOLVER: - BASE_LR: 0.00006 - WARMUP_FACTOR: 1e-6 - WARMUP_ITERS: 1500 - WEIGHT_DECAY: 0.01 - WEIGHT_DECAY_NORM: 0.0 - WEIGHT_DECAY_EMBED: 0.0 - BACKBONE_MULTIPLIER: 1.0 -INPUT: - MIN_SIZE_TRAIN: (480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800) - MAX_SIZE_TRAIN: 1000 - CROP: - ENABLED: True - TYPE: "absolute_range" - SIZE: (384, 600) - FORMAT: "RGB" diff --git a/SeMask-MaskFormer/configs/coco-panoptic/swin/maskformer_panoptic_swin_small_bs64_554k.yaml b/SeMask-MaskFormer/configs/coco-panoptic/swin/maskformer_panoptic_swin_small_bs64_554k.yaml deleted file mode 100644 index 3ed3c7d..0000000 --- a/SeMask-MaskFormer/configs/coco-panoptic/swin/maskformer_panoptic_swin_small_bs64_554k.yaml +++ /dev/null @@ -1,32 +0,0 @@ -_BASE_: ../maskformer_panoptic_R50_bs64_554k.yaml -MODEL: - BACKBONE: - NAME: "D2SwinTransformer" - SWIN: - EMBED_DIM: 96 - DEPTHS: [2, 2, 18, 2] - NUM_HEADS: [3, 6, 12, 24] - WINDOW_SIZE: 7 - APE: False - DROP_PATH_RATE: 0.3 - PATCH_NORM: True - WEIGHTS: "swin_small_patch4_window7_224.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] - SEM_SEG_HEAD: - PIXEL_DECODER_NAME: "BasePixelDecoder" - MASK_FORMER: - TRANSFORMER_IN_FEATURE: "res5" - ENFORCE_INPUT_PROJ: True - TEST: - PANOPTIC_ON: True - OVERLAP_THRESHOLD: 0.8 - OBJECT_MASK_THRESHOLD: 0.8 -SOLVER: - BASE_LR: 0.00006 - WARMUP_FACTOR: 1e-6 - WARMUP_ITERS: 1500 - WEIGHT_DECAY: 0.01 - WEIGHT_DECAY_NORM: 0.0 - WEIGHT_DECAY_EMBED: 0.0 - BACKBONE_MULTIPLIER: 1.0 diff --git a/SeMask-MaskFormer/configs/coco-panoptic/swin/maskformer_panoptic_swin_tiny_bs64_554k.yaml b/SeMask-MaskFormer/configs/coco-panoptic/swin/maskformer_panoptic_swin_tiny_bs64_554k.yaml deleted file mode 100644 index 4572f15..0000000 --- a/SeMask-MaskFormer/configs/coco-panoptic/swin/maskformer_panoptic_swin_tiny_bs64_554k.yaml +++ /dev/null @@ -1,32 +0,0 @@ -_BASE_: ../maskformer_panoptic_R50_bs64_554k.yaml -MODEL: - BACKBONE: - NAME: "D2SwinTransformer" - SWIN: - EMBED_DIM: 96 - DEPTHS: [2, 2, 6, 2] - NUM_HEADS: [3, 6, 12, 24] - WINDOW_SIZE: 7 - APE: False - DROP_PATH_RATE: 0.3 - PATCH_NORM: True - WEIGHTS: "swin_tiny_patch4_window7_224.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] - SEM_SEG_HEAD: - PIXEL_DECODER_NAME: "BasePixelDecoder" - MASK_FORMER: - TRANSFORMER_IN_FEATURE: "res5" - ENFORCE_INPUT_PROJ: True - TEST: - PANOPTIC_ON: True - OVERLAP_THRESHOLD: 0.8 - OBJECT_MASK_THRESHOLD: 0.8 -SOLVER: - BASE_LR: 0.00006 - WARMUP_FACTOR: 1e-6 - WARMUP_ITERS: 1500 - WEIGHT_DECAY: 0.01 - WEIGHT_DECAY_NORM: 0.0 - WEIGHT_DECAY_EMBED: 0.0 - BACKBONE_MULTIPLIER: 1.0 diff --git a/SeMask-MaskFormer/configs/coco-stuff-10k-171/Base-COCOStuff10K-171.yaml b/SeMask-MaskFormer/configs/coco-stuff-10k-171/Base-COCOStuff10K-171.yaml deleted file mode 100644 index 3d5a2cb..0000000 --- a/SeMask-MaskFormer/configs/coco-stuff-10k-171/Base-COCOStuff10K-171.yaml +++ /dev/null @@ -1,59 +0,0 @@ -MODEL: - BACKBONE: - FREEZE_AT: 0 - NAME: "build_resnet_backbone" - WEIGHTS: "detectron2://ImageNetPretrained/torchvision/R-50.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] - RESNETS: - DEPTH: 50 - STEM_TYPE: "basic" # not used - STEM_OUT_CHANNELS: 64 - STRIDE_IN_1X1: False - OUT_FEATURES: ["res2", "res3", "res4", "res5"] - # NORM: "SyncBN" - RES5_MULTI_GRID: [1, 1, 1] # not used -DATASETS: - TRAIN: ("coco_2017_train_stuff_10k_sem_seg",) - TEST: ("coco_2017_test_stuff_10k_sem_seg",) -SOLVER: - IMS_PER_BATCH: 32 - BASE_LR: 0.0001 - MAX_ITER: 60000 - WARMUP_FACTOR: 1.0 - WARMUP_ITERS: 0 - WEIGHT_DECAY: 0.0001 - OPTIMIZER: "ADAMW" - LR_SCHEDULER_NAME: "WarmupPolyLR" - BACKBONE_MULTIPLIER: 0.1 - CLIP_GRADIENTS: - ENABLED: True - CLIP_TYPE: "full_model" - CLIP_VALUE: 0.01 - NORM_TYPE: 2.0 -INPUT: - MIN_SIZE_TRAIN: !!python/object/apply:eval ["[int(x * 0.1 * 640) for x in range(5, 16)]"] - MIN_SIZE_TRAIN_SAMPLING: "choice" - MIN_SIZE_TEST: 640 - MAX_SIZE_TRAIN: 2560 - MAX_SIZE_TEST: 2560 - CROP: - ENABLED: True - TYPE: "absolute" - SIZE: (640, 640) - SINGLE_CATEGORY_MAX_AREA: 1.0 - COLOR_AUG_SSD: True - SIZE_DIVISIBILITY: 640 # used in dataset mapper - FORMAT: "RGB" - DATASET_MAPPER_NAME: "mask_former_semantic" -TEST: - EVAL_PERIOD: 5000 - AUG: - ENABLED: False - MIN_SIZES: [320, 480, 640, 800, 960, 1120] - MAX_SIZE: 4480 - FLIP: True -DATALOADER: - FILTER_EMPTY_ANNOTATIONS: True - NUM_WORKERS: 4 -VERSION: 2 diff --git a/SeMask-MaskFormer/configs/coco-stuff-10k-171/maskformer_R101_bs32_60k.yaml b/SeMask-MaskFormer/configs/coco-stuff-10k-171/maskformer_R101_bs32_60k.yaml deleted file mode 100644 index 7864b6a..0000000 --- a/SeMask-MaskFormer/configs/coco-stuff-10k-171/maskformer_R101_bs32_60k.yaml +++ /dev/null @@ -1,11 +0,0 @@ -_BASE_: maskformer_R50_bs32_60k.yaml -MODEL: - WEIGHTS: "R-101.pkl" - RESNETS: - DEPTH: 101 - STEM_TYPE: "basic" # not used - STEM_OUT_CHANNELS: 64 - STRIDE_IN_1X1: False - OUT_FEATURES: ["res2", "res3", "res4", "res5"] - # NORM: "SyncBN" - RES5_MULTI_GRID: [1, 1, 1] # not used diff --git a/SeMask-MaskFormer/configs/coco-stuff-10k-171/maskformer_R101c_bs32_60k.yaml b/SeMask-MaskFormer/configs/coco-stuff-10k-171/maskformer_R101c_bs32_60k.yaml deleted file mode 100644 index 4df030f..0000000 --- a/SeMask-MaskFormer/configs/coco-stuff-10k-171/maskformer_R101c_bs32_60k.yaml +++ /dev/null @@ -1,13 +0,0 @@ -_BASE_: maskformer_R50_bs32_60k.yaml -MODEL: - BACKBONE: - NAME: "build_resnet_deeplab_backbone" - WEIGHTS: "detectron2://DeepLab/R-103.pkl" - RESNETS: - DEPTH: 101 - STEM_TYPE: "deeplab" - STEM_OUT_CHANNELS: 128 - STRIDE_IN_1X1: False - OUT_FEATURES: ["res2", "res3", "res4", "res5"] - # NORM: "SyncBN" - RES5_MULTI_GRID: [1, 2, 4] diff --git a/SeMask-MaskFormer/configs/coco-stuff-10k-171/maskformer_R50_bs32_60k.yaml b/SeMask-MaskFormer/configs/coco-stuff-10k-171/maskformer_R50_bs32_60k.yaml deleted file mode 100644 index 5b737d4..0000000 --- a/SeMask-MaskFormer/configs/coco-stuff-10k-171/maskformer_R50_bs32_60k.yaml +++ /dev/null @@ -1,27 +0,0 @@ -_BASE_: Base-COCOStuff10K-171.yaml -MODEL: - META_ARCHITECTURE: "MaskFormer" - SEM_SEG_HEAD: - NAME: "MaskFormerHead" - IN_FEATURES: ["res2", "res3", "res4", "res5"] - IGNORE_VALUE: 255 - NUM_CLASSES: 171 - COMMON_STRIDE: 4 # not used, hard-coded - LOSS_WEIGHT: 1.0 - CONVS_DIM: 256 - MASK_DIM: 256 - NORM: "GN" - MASK_FORMER: - TRANSFORMER_IN_FEATURE: "res5" - DEEP_SUPERVISION: True - NO_OBJECT_WEIGHT: 0.1 - DICE_WEIGHT: 1.0 - MASK_WEIGHT: 20.0 - HIDDEN_DIM: 256 - NUM_OBJECT_QUERIES: 100 - NHEADS: 8 - DROPOUT: 0.1 - DIM_FEEDFORWARD: 2048 - ENC_LAYERS: 0 - DEC_LAYERS: 6 - PRE_NORM: False diff --git a/SeMask-MaskFormer/configs/coco-stuff-10k-171/per_pixel_baseline_R50_bs32_60k.yaml b/SeMask-MaskFormer/configs/coco-stuff-10k-171/per_pixel_baseline_R50_bs32_60k.yaml deleted file mode 100644 index 4442c16..0000000 --- a/SeMask-MaskFormer/configs/coco-stuff-10k-171/per_pixel_baseline_R50_bs32_60k.yaml +++ /dev/null @@ -1,13 +0,0 @@ -_BASE_: Base-COCOStuff10K-171.yaml -MODEL: - META_ARCHITECTURE: "SemanticSegmentor" - SEM_SEG_HEAD: - NAME: "PerPixelBaselineHead" - IN_FEATURES: ["res2", "res3", "res4", "res5"] - IGNORE_VALUE: 255 - NUM_CLASSES: 171 - COMMON_STRIDE: 4 # not used, hard-coded - LOSS_WEIGHT: 1.0 - CONVS_DIM: 256 - MASK_DIM: 256 - NORM: "GN" diff --git a/SeMask-MaskFormer/configs/coco-stuff-10k-171/per_pixel_baseline_plus_R50_bs32_60k.yaml b/SeMask-MaskFormer/configs/coco-stuff-10k-171/per_pixel_baseline_plus_R50_bs32_60k.yaml deleted file mode 100644 index 72f2021..0000000 --- a/SeMask-MaskFormer/configs/coco-stuff-10k-171/per_pixel_baseline_plus_R50_bs32_60k.yaml +++ /dev/null @@ -1,24 +0,0 @@ -_BASE_: Base-COCOStuff10K-171.yaml -MODEL: - META_ARCHITECTURE: "SemanticSegmentor" - SEM_SEG_HEAD: - NAME: "PerPixelBaselinePlusHead" - IN_FEATURES: ["res2", "res3", "res4", "res5"] - IGNORE_VALUE: 255 - NUM_CLASSES: 171 - COMMON_STRIDE: 4 # not used, hard-coded - LOSS_WEIGHT: 1.0 - CONVS_DIM: 256 - MASK_DIM: 256 - NORM: "GN" - MASK_FORMER: - TRANSFORMER_IN_FEATURE: "res5" - DEEP_SUPERVISION: True - HIDDEN_DIM: 256 - NUM_OBJECT_QUERIES: 171 # remember to set this to NUM_CLASSES - NHEADS: 8 - DROPOUT: 0.1 - DIM_FEEDFORWARD: 2048 - ENC_LAYERS: 0 - DEC_LAYERS: 6 - PRE_NORM: False diff --git a/SeMask-MaskFormer/configs/mapillary-vistas-65/Base-MapillaryVistas-65.yaml b/SeMask-MaskFormer/configs/mapillary-vistas-65/Base-MapillaryVistas-65.yaml deleted file mode 100644 index d3dacc3..0000000 --- a/SeMask-MaskFormer/configs/mapillary-vistas-65/Base-MapillaryVistas-65.yaml +++ /dev/null @@ -1,54 +0,0 @@ -MODEL: - BACKBONE: - FREEZE_AT: 0 - NAME: "build_resnet_backbone" - WEIGHTS: "detectron2://ImageNetPretrained/torchvision/R-50.pkl" - PIXEL_MEAN: [123.675, 116.280, 103.530] - PIXEL_STD: [58.395, 57.120, 57.375] - RESNETS: - DEPTH: 50 - STEM_TYPE: "basic" # not used - STEM_OUT_CHANNELS: 64 - STRIDE_IN_1X1: False - OUT_FEATURES: ["res2", "res3", "res4", "res5"] - # NORM: "SyncBN" - RES5_MULTI_GRID: [1, 1, 1] # not used -DATASETS: - TRAIN: ("mapillary_vistas_sem_seg_train",) - TEST: ("mapillary_vistas_sem_seg_val",) -SOLVER: - IMS_PER_BATCH: 16 - BASE_LR: 0.0001 - MAX_ITER: 300000 - WARMUP_FACTOR: 1.0 - WARMUP_ITERS: 0 - WEIGHT_DECAY: 0.0001 - OPTIMIZER: "ADAMW" - LR_SCHEDULER_NAME: "WarmupPolyLR" - BACKBONE_MULTIPLIER: 0.1 - CLIP_GRADIENTS: - ENABLED: True - CLIP_TYPE: "full_model" - CLIP_VALUE: 0.01 - NORM_TYPE: 2.0 -INPUT: - MIN_SIZE_TRAIN: !!python/object/apply:eval ["[int(x * 0.1 * 2048) for x in range(5, 21)]"] - MIN_SIZE_TRAIN_SAMPLING: "choice" - MIN_SIZE_TEST: 2048 - MAX_SIZE_TRAIN: 8192 - MAX_SIZE_TEST: 2048 - CROP: - ENABLED: True - TYPE: "absolute" - SIZE: (1280, 1280) - SINGLE_CATEGORY_MAX_AREA: 1.0 - COLOR_AUG_SSD: True - SIZE_DIVISIBILITY: 1280 # used in dataset mapper - FORMAT: "RGB" - DATASET_MAPPER_NAME: "mask_former_semantic" -TEST: - EVAL_PERIOD: 5000 -DATALOADER: - FILTER_EMPTY_ANNOTATIONS: True - NUM_WORKERS: 10 -VERSION: 2 diff --git a/SeMask-MaskFormer/configs/mapillary-vistas-65/maskformer_R50_bs16_300k.yaml b/SeMask-MaskFormer/configs/mapillary-vistas-65/maskformer_R50_bs16_300k.yaml deleted file mode 100644 index 1935082..0000000 --- a/SeMask-MaskFormer/configs/mapillary-vistas-65/maskformer_R50_bs16_300k.yaml +++ /dev/null @@ -1,27 +0,0 @@ -_BASE_: Base-MapillaryVistas-65.yaml -MODEL: - META_ARCHITECTURE: "MaskFormer" - SEM_SEG_HEAD: - NAME: "MaskFormerHead" - IN_FEATURES: ["res2", "res3", "res4", "res5"] - IGNORE_VALUE: 65 - NUM_CLASSES: 65 - COMMON_STRIDE: 4 # not used, hard-coded - LOSS_WEIGHT: 1.0 - CONVS_DIM: 256 - MASK_DIM: 256 - NORM: "GN" - MASK_FORMER: - TRANSFORMER_IN_FEATURE: "res5" - DEEP_SUPERVISION: True - NO_OBJECT_WEIGHT: 0.1 - DICE_WEIGHT: 1.0 - MASK_WEIGHT: 20.0 - HIDDEN_DIM: 256 - NUM_OBJECT_QUERIES: 100 - NHEADS: 8 - DROPOUT: 0.1 - DIM_FEEDFORWARD: 2048 - ENC_LAYERS: 0 - DEC_LAYERS: 6 - PRE_NORM: False