diff --git a/README.md b/README.md
index 852b6d7..b8d2543 100644
--- a/README.md
+++ b/README.md
@@ -17,6 +17,7 @@ This repo contains the code for our paper **SeMask: Semantically Masked Transfor
## Contents
+
1. [Results](#1-results)
2. [Setup Instructions](#2-setup-instructions)
3. [Citing SeMask](#3-citing-semask)
@@ -27,18 +28,16 @@ This repo contains the code for our paper **SeMask: Semantically Masked Transfor
### ADE20K
-
-
| Method | Backbone | Crop Size | mIoU | mIoU (ms+flip) | #params | config | Checkpoint |
| :---:| :---: | :---: | :---:| :---: | :---: | :---: | :---: |
-| SeMask-T FPN | SeMask Swin-T | 512x512 | 42.11 | 43.16 | 35M | [config](configs/semask_swin/ade20k/semfpn_semask_swin_tiny_patch4_window7_512x512_80k_ade20k.py) | TBD |
+| SeMask-T FPN | SeMask Swin-T | 512x512 | 42.06 | 43.36 | 35M | [config](configs/semask_swin/ade20k/semfpn_semask_swin_tiny_patch4_window7_512x512_80k_ade20k.py) | [checkpoint](https://drive.google.com/file/d/1L0daUHWQGNGCXHF-cKWEauPSyBV0GLOR/view?usp=sharing) |
| SeMask-S FPN | SeMask Swin-S | 512x512 | 45.92 | 47.63 | 56M | [config](SeMask-FPN/configs/semask_swin/ade20k/semfpn_semask_swin_small_patch4_window7_512x512_80k_ade20k.py) | [checkpoint](https://drive.google.com/file/d/1QhDG4SyGFtWL5kP9BbBoyPqTuFu7fH_y/view?usp=sharing) |
| SeMask-B FPN | SeMask Swin-B† | 512x512 | 49.35 | 50.98 | 96M | [config](SeMask-FPN/configs/semask_swin/ade20k/semfpn_semask_swin_base_patch4_window12_512x512_80k_ade20k.py) | [checkpoint](https://drive.google.com/file/d/1PXCEhrrUy5TJC4dUp7YDQvaapnMzGT6C/view?usp=sharing) |
| SeMask-L FPN | SeMask Swin-L† | 640x640 | 51.89 | 53.52 | 211M| [config](SeMask-FPN/configs/semask_swin/ade20k/semfpn_semask_swin_large_patch4_window12_640x640_80k_ade20k.py) | [checkpoint](https://drive.google.com/file/d/1u5flfAQCiQJbMZbZPIlGUGTYBz9Ca7rE/view?usp=sharing) |
| SeMask-L MaskFormer | SeMask Swin-L† | 640x640 | 54.75 | 56.15 | 219M | [config](SeMask-MaskFormer/configs/ade20k-150/semask_swin/maskformer_semask_swin_large_IN21k_384_bs16_160k_res640.yaml) | [checkpoint](https://drive.google.com/file/d/1KgKQLGv9CcBqeEvOEDdxQ-O6lpMfHBLw/view?usp=sharing) |
| SeMask-L Mask2Former | SeMask Swin-L† | 640x640 | 56.41 | 57.52 | 222M | [config](SeMask-Mask2Former/configs/ade20k/semantic-segmentation/semask_swin/maskformer2_semask_swin_large_IN21k_384_bs16_160k_res640.yaml) | [checkpoint](https://drive.google.com/file/d/1hN1I4Wv7_1FCPOsfA-5PELn6Xn3b_R8a/view?usp=sharing) |
-| SeMask-L Mask2Former FAPN | SeMask Swin-L† | 640x640 | **56.68** | 58.00 | 227M | [config](SeMask-FAPN/SeMask-Mask2Former/configs/ade20k/semantic-segmentation/semask_swin/fapn_maskformer2_semask_swin_large_IN21k_384_bs16_160k_res640.yaml) | TBD |
-| SeMask-L Mask2Former MSFAPN | SeMask Swin-L† | 640x640 | 56.54 | **58.22** | 224M | [config](SeMask-FAPN/SeMask-Mask2Former/configs/ade20k/semantic-segmentation/semask_swin/msfapn_maskformer2_semask_swin_large_IN21k_384_bs16_160k_res640.yaml) | [checkpoint](https://drive.google.com/file/d/1w-DRGufIv3zpDO7rJFv2z5WeLx0pDTJe/view?usp=sharing) |
+| SeMask-L Mask2Former MSFaPN | SeMask Swin-L† | 640x640 | 56.54 | 58.22 | 224M | [config](SeMask-FAPN/SeMask-Mask2Former/configs/ade20k/semantic-segmentation/semask_swin/msfapn_maskformer2_semask_swin_large_IN21k_384_bs16_160k_res640.yaml) | [checkpoint](https://drive.google.com/file/d/1w-DRGufIv3zpDO7rJFv2z5WeLx0pDTJe/view?usp=sharing) |
+| SeMask-L Mask2Former FaPN | SeMask Swin-L† | 640x640 | **56.97** | **58.22** | 227M | [config](SeMask-FAPN/SeMask-Mask2Former/configs/ade20k/semantic-segmentation/semask_swin/fapn_maskformer2_semask_swin_large_IN21k_384_bs16_160k_res640.yaml) | [checkpoint](https://drive.google.com/file/d/1DQ9KltSLDj47H2jYnCtVwyBf7KPR9SM_/view?usp=sharing) |
### Cityscapes
@@ -68,7 +67,7 @@ We provide the codebase with SeMask incorporated into various models. Please che
- SeMask-FPN: [Setup Instructions](SeMask-FPN/README.md#2-setup-instructions)
- SeMask-MaskFormer: [Setup Instructions](SeMask-MaskFormer/README.md#2-setup-instructions)
- SeMask-Mask2Former: [Setup Instructions](SeMask-Mask2Former/README.md#2-setup-instructions)
-- SeMask-FAPN: [Setup Instructions](SeMask-FAPN/README.md#2-setup-instructions)
+- SeMask-FaPN: [Setup Instructions](SeMask-FAPN/README.md#2-setup-instructions)
## 3. Citing SeMask
diff --git a/SeMask-FAPN/README.md b/SeMask-FAPN/README.md
index 587344e..2918d91 100644
--- a/SeMask-FAPN/README.md
+++ b/SeMask-FAPN/README.md
@@ -16,15 +16,15 @@ This repo contains the code for our paper **SeMask: Semantically Masked Transfor
| Method | Backbone | Crop Size | mIoU | mIoU (ms+flip) | #params | config | Checkpoint |
| :---:| :---: | :---: | :---:| :---: | :---: | :---: | :---: |
-| SeMask-L Mask2Former FAPN | SeMask Swin-L† | 640x640 | **56.68** | 58.00 | 227M | [config](SeMask-Mask2Former/configs/ade20k/semantic-segmentation/semask_swin/fapn_maskformer2_semask_swin_large_IN21k_384_bs16_160k_res640.yaml) | TBD |
-| SeMask-L Mask2Former MSFAPN | SeMask Swin-L† | 640x640 | 56.54 | **58.22** | 224M | [config](SeMask-Mask2Former/configs/ade20k/semantic-segmentation/semask_swin/msfapn_maskformer2_semask_swin_large_IN21k_384_bs16_160k_res640.yaml) | [checkpoint](https://drive.google.com/file/d/1w-DRGufIv3zpDO7rJFv2z5WeLx0pDTJe/view?usp=sharing) |
+| SeMask-L Mask2Former MSFaPN | SeMask Swin-L† | 640x640 | 56.54 | 58.22 | 224M | [config](SeMask-Mask2Former/configs/ade20k/semantic-segmentation/semask_swin/msfapn_maskformer2_semask_swin_large_IN21k_384_bs16_160k_res640.yaml) | [checkpoint](https://drive.google.com/file/d/1w-DRGufIv3zpDO7rJFv2z5WeLx0pDTJe/view?usp=sharing) |
+| SeMask-L Mask2Former FaPN | SeMask Swin-L† | 640x640 | **56.97** | **58.22** | 227M | [config](SeMask-Mask2Former/configs/ade20k/semantic-segmentation/semask_swin/fapn_maskformer2_semask_swin_large_IN21k_384_bs16_160k_res640.yaml) | [checkpoint](https://drive.google.com/file/d/1DQ9KltSLDj47H2jYnCtVwyBf7KPR9SM_/view?usp=sharing) |
## 2. Setup Instructions
### Installation
-- [DCNv2](DCNv2) code is compatible with [Pytorch v1.7.1](https://pytorch.org/get-started/locally/).
+- Build the [DCNv2](DCNv2) module which is compatible with [Pytorch v1.7.1](https://pytorch.org/get-started/locally/).
- Follow the installation instructions for [Mask2Former](SeMask-Mask2Former/INSTALL.md).
diff --git a/SeMask-FAPN/SeMask-Mask2Former/GETTING_STARTED.md b/SeMask-FAPN/SeMask-Mask2Former/GETTING_STARTED.md
index 6bb096c..9c25405 100644
--- a/SeMask-FAPN/SeMask-Mask2Former/GETTING_STARTED.md
+++ b/SeMask-FAPN/SeMask-Mask2Former/GETTING_STARTED.md
@@ -9,11 +9,11 @@ Please see [Getting Started with Detectron2](https://github.com/facebookresearch
1. Pick a model and its config file from
[model zoo](MODEL_ZOO.md),
- for example, `configs/coco/panoptic-segmentation/maskformer2_R50_bs16_50ep.yaml`.
+ for example, `configs/ade20k/semantic-segmentation/semask_swin/fapn_maskformer2_semask_swin_large_IN21k_384_bs16_160k_res640.yaml`.
2. We provide `demo.py` that is able to demo builtin configs. Run it with:
```
cd demo/
-python demo.py --config-file ../configs/coco/panoptic-segmentation/maskformer2_R50_bs16_50ep.yaml \
+python demo.py --config-file ../configs/ade20k/semantic-segmentation/semask_swin/fapn_maskformer2_semask_swin_large_IN21k_384_bs16_160k_res640.yaml \
--input input1.jpg input2.jpg \
[--other-options]
--opts MODEL.WEIGHTS /path/to/checkpoint_file
@@ -39,7 +39,7 @@ setup the corresponding datasets following
then run:
```
python train_net.py --num-gpus 8 \
- --config-file configs/coco/panoptic-segmentation/maskformer2_R50_bs16_50ep.yaml
+ --config-file configs/ade20k/semantic-segmentation/semask_swin/fapn_maskformer2_semask_swin_large_IN21k_384_bs16_160k_res640.yaml
```
The configs are made for 8-GPU training.
@@ -47,14 +47,14 @@ Since we use ADAMW optimizer, it is not clear how to scale learning rate with ba
To train on 1 GPU, you need to figure out learning rate and batch size by yourself:
```
python train_net.py \
- --config-file configs/coco/panoptic-segmentation/maskformer2_R50_bs16_50ep.yaml \
+ --config-file configs/ade20k/semantic-segmentation/semask_swin/fapn_maskformer2_semask_swin_large_IN21k_384_bs16_160k_res640.yaml \
--num-gpus 1 SOLVER.IMS_PER_BATCH SET_TO_SOME_REASONABLE_VALUE SOLVER.BASE_LR SET_TO_SOME_REASONABLE_VALUE
```
To evaluate a model's performance, use
```
python train_net.py \
- --config-file configs/coco/panoptic-segmentation/maskformer2_R50_bs16_50ep.yaml \
+ --config-file configs/ade20k/semantic-segmentation/semask_swin/fapn_maskformer2_semask_swin_large_IN21k_384_bs16_160k_res640.yaml \
--eval-only MODEL.WEIGHTS /path/to/checkpoint_file
```
For more options, see `python train_net.py -h`.
diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/ade20k/instance-segmentation/Base-ADE20K-InstanceSegmentation.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/ade20k/instance-segmentation/Base-ADE20K-InstanceSegmentation.yaml
deleted file mode 100644
index 50a1c13..0000000
--- a/SeMask-FAPN/SeMask-Mask2Former/configs/ade20k/instance-segmentation/Base-ADE20K-InstanceSegmentation.yaml
+++ /dev/null
@@ -1,61 +0,0 @@
-MODEL:
- BACKBONE:
- FREEZE_AT: 0
- NAME: "build_resnet_backbone"
- WEIGHTS: "detectron2://ImageNetPretrained/torchvision/R-50.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
- RESNETS:
- DEPTH: 50
- STEM_TYPE: "basic" # not used
- STEM_OUT_CHANNELS: 64
- STRIDE_IN_1X1: False
- OUT_FEATURES: ["res2", "res3", "res4", "res5"]
- # NORM: "SyncBN"
- RES5_MULTI_GRID: [1, 1, 1] # not used
-DATASETS:
- TRAIN: ("ade20k_instance_train",)
- TEST: ("ade20k_instance_val",)
-SOLVER:
- IMS_PER_BATCH: 16
- BASE_LR: 0.0001
- MAX_ITER: 160000
- WARMUP_FACTOR: 1.0
- WARMUP_ITERS: 0
- WEIGHT_DECAY: 0.05
- OPTIMIZER: "ADAMW"
- LR_SCHEDULER_NAME: "WarmupPolyLR"
- BACKBONE_MULTIPLIER: 0.1
- CLIP_GRADIENTS:
- ENABLED: True
- CLIP_TYPE: "full_model"
- CLIP_VALUE: 0.01
- NORM_TYPE: 2.0
- AMP:
- ENABLED: True
-INPUT:
- MIN_SIZE_TRAIN: !!python/object/apply:eval ["[int(x * 0.1 * 640) for x in range(5, 21)]"]
- MIN_SIZE_TRAIN_SAMPLING: "choice"
- MIN_SIZE_TEST: 640
- MAX_SIZE_TRAIN: 2560
- MAX_SIZE_TEST: 2560
- CROP:
- ENABLED: True
- TYPE: "absolute"
- SIZE: (640, 640)
- SINGLE_CATEGORY_MAX_AREA: 1.0
- COLOR_AUG_SSD: True
- SIZE_DIVISIBILITY: 640 # used in dataset mapper
- FORMAT: "RGB"
- DATASET_MAPPER_NAME: "mask_former_instance"
-TEST:
- EVAL_PERIOD: 5000
- AUG:
- ENABLED: False
- MIN_SIZES: [320, 480, 640, 800, 960, 1120]
- MAX_SIZE: 4480
- FLIP: True
-DATALOADER:
- FILTER_EMPTY_ANNOTATIONS: True
- NUM_WORKERS: 4
-VERSION: 2
diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/ade20k/instance-segmentation/maskformer2_R50_bs16_160k.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/ade20k/instance-segmentation/maskformer2_R50_bs16_160k.yaml
deleted file mode 100644
index e37bcfb..0000000
--- a/SeMask-FAPN/SeMask-Mask2Former/configs/ade20k/instance-segmentation/maskformer2_R50_bs16_160k.yaml
+++ /dev/null
@@ -1,44 +0,0 @@
-_BASE_: Base-ADE20K-InstanceSegmentation.yaml
-MODEL:
- META_ARCHITECTURE: "MaskFormer"
- SEM_SEG_HEAD:
- NAME: "MaskFormerHead"
- IGNORE_VALUE: 255
- NUM_CLASSES: 100
- LOSS_WEIGHT: 1.0
- CONVS_DIM: 256
- MASK_DIM: 256
- NORM: "GN"
- # pixel decoder
- PIXEL_DECODER_NAME: "MSDeformAttnPixelDecoder"
- IN_FEATURES: ["res2", "res3", "res4", "res5"]
- DEFORMABLE_TRANSFORMER_ENCODER_IN_FEATURES: ["res3", "res4", "res5"]
- COMMON_STRIDE: 4
- TRANSFORMER_ENC_LAYERS: 6
- MASK_FORMER:
- TRANSFORMER_DECODER_NAME: "MultiScaleMaskedTransformerDecoder"
- TRANSFORMER_IN_FEATURE: "multi_scale_pixel_decoder"
- DEEP_SUPERVISION: True
- NO_OBJECT_WEIGHT: 0.1
- CLASS_WEIGHT: 2.0
- MASK_WEIGHT: 5.0
- DICE_WEIGHT: 5.0
- HIDDEN_DIM: 256
- NUM_OBJECT_QUERIES: 100
- NHEADS: 8
- DROPOUT: 0.0
- DIM_FEEDFORWARD: 2048
- ENC_LAYERS: 0
- PRE_NORM: False
- ENFORCE_INPUT_PROJ: False
- SIZE_DIVISIBILITY: 32
- DEC_LAYERS: 10 # 9 decoder layers, add one for the loss on learnable query
- TRAIN_NUM_POINTS: 12544
- OVERSAMPLE_RATIO: 3.0
- IMPORTANCE_SAMPLE_RATIO: 0.75
- TEST:
- SEMANTIC_ON: True
- INSTANCE_ON: True
- PANOPTIC_ON: True
- OVERLAP_THRESHOLD: 0.8
- OBJECT_MASK_THRESHOLD: 0.8
diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/ade20k/instance-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_160k.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/ade20k/instance-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_160k.yaml
deleted file mode 100644
index af03d4d..0000000
--- a/SeMask-FAPN/SeMask-Mask2Former/configs/ade20k/instance-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_160k.yaml
+++ /dev/null
@@ -1,18 +0,0 @@
-_BASE_: ../maskformer2_R50_bs16_160k.yaml
-MODEL:
- BACKBONE:
- NAME: "D2SwinTransformer"
- SWIN:
- EMBED_DIM: 192
- DEPTHS: [2, 2, 18, 2]
- NUM_HEADS: [6, 12, 24, 48]
- WINDOW_SIZE: 12
- APE: False
- DROP_PATH_RATE: 0.3
- PATCH_NORM: True
- PRETRAIN_IMG_SIZE: 384
- WEIGHTS: "swin_large_patch4_window12_384_22k.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
- MASK_FORMER:
- NUM_OBJECT_QUERIES: 200
diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/ade20k/panoptic-segmentation/Base-ADE20K-PanopticSegmentation.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/ade20k/panoptic-segmentation/Base-ADE20K-PanopticSegmentation.yaml
deleted file mode 100644
index 559be07..0000000
--- a/SeMask-FAPN/SeMask-Mask2Former/configs/ade20k/panoptic-segmentation/Base-ADE20K-PanopticSegmentation.yaml
+++ /dev/null
@@ -1,61 +0,0 @@
-MODEL:
- BACKBONE:
- FREEZE_AT: 0
- NAME: "build_resnet_backbone"
- WEIGHTS: "detectron2://ImageNetPretrained/torchvision/R-50.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
- RESNETS:
- DEPTH: 50
- STEM_TYPE: "basic" # not used
- STEM_OUT_CHANNELS: 64
- STRIDE_IN_1X1: False
- OUT_FEATURES: ["res2", "res3", "res4", "res5"]
- # NORM: "SyncBN"
- RES5_MULTI_GRID: [1, 1, 1] # not used
-DATASETS:
- TRAIN: ("ade20k_panoptic_train",)
- TEST: ("ade20k_panoptic_val",)
-SOLVER:
- IMS_PER_BATCH: 16
- BASE_LR: 0.0001
- MAX_ITER: 160000
- WARMUP_FACTOR: 1.0
- WARMUP_ITERS: 0
- WEIGHT_DECAY: 0.05
- OPTIMIZER: "ADAMW"
- LR_SCHEDULER_NAME: "WarmupPolyLR"
- BACKBONE_MULTIPLIER: 0.1
- CLIP_GRADIENTS:
- ENABLED: True
- CLIP_TYPE: "full_model"
- CLIP_VALUE: 0.01
- NORM_TYPE: 2.0
- AMP:
- ENABLED: True
-INPUT:
- MIN_SIZE_TRAIN: !!python/object/apply:eval ["[int(x * 0.1 * 640) for x in range(5, 21)]"]
- MIN_SIZE_TRAIN_SAMPLING: "choice"
- MIN_SIZE_TEST: 640
- MAX_SIZE_TRAIN: 2560
- MAX_SIZE_TEST: 2560
- CROP:
- ENABLED: True
- TYPE: "absolute"
- SIZE: (640, 640)
- SINGLE_CATEGORY_MAX_AREA: 1.0
- COLOR_AUG_SSD: True
- SIZE_DIVISIBILITY: 640 # used in dataset mapper
- FORMAT: "RGB"
- DATASET_MAPPER_NAME: "mask_former_panoptic"
-TEST:
- EVAL_PERIOD: 5000
- AUG:
- ENABLED: False
- MIN_SIZES: [320, 480, 640, 800, 960, 1120]
- MAX_SIZE: 4480
- FLIP: True
-DATALOADER:
- FILTER_EMPTY_ANNOTATIONS: True
- NUM_WORKERS: 4
-VERSION: 2
diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/ade20k/panoptic-segmentation/maskformer2_R50_bs16_160k.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/ade20k/panoptic-segmentation/maskformer2_R50_bs16_160k.yaml
deleted file mode 100644
index 82c0828..0000000
--- a/SeMask-FAPN/SeMask-Mask2Former/configs/ade20k/panoptic-segmentation/maskformer2_R50_bs16_160k.yaml
+++ /dev/null
@@ -1,44 +0,0 @@
-_BASE_: Base-ADE20K-PanopticSegmentation.yaml
-MODEL:
- META_ARCHITECTURE: "MaskFormer"
- SEM_SEG_HEAD:
- NAME: "MaskFormerHead"
- IGNORE_VALUE: 255
- NUM_CLASSES: 150
- LOSS_WEIGHT: 1.0
- CONVS_DIM: 256
- MASK_DIM: 256
- NORM: "GN"
- # pixel decoder
- PIXEL_DECODER_NAME: "MSDeformAttnPixelDecoder"
- IN_FEATURES: ["res2", "res3", "res4", "res5"]
- DEFORMABLE_TRANSFORMER_ENCODER_IN_FEATURES: ["res3", "res4", "res5"]
- COMMON_STRIDE: 4
- TRANSFORMER_ENC_LAYERS: 6
- MASK_FORMER:
- TRANSFORMER_DECODER_NAME: "MultiScaleMaskedTransformerDecoder"
- TRANSFORMER_IN_FEATURE: "multi_scale_pixel_decoder"
- DEEP_SUPERVISION: True
- NO_OBJECT_WEIGHT: 0.1
- CLASS_WEIGHT: 2.0
- MASK_WEIGHT: 5.0
- DICE_WEIGHT: 5.0
- HIDDEN_DIM: 256
- NUM_OBJECT_QUERIES: 100
- NHEADS: 8
- DROPOUT: 0.0
- DIM_FEEDFORWARD: 2048
- ENC_LAYERS: 0
- PRE_NORM: False
- ENFORCE_INPUT_PROJ: False
- SIZE_DIVISIBILITY: 32
- DEC_LAYERS: 10 # 9 decoder layers, add one for the loss on learnable query
- TRAIN_NUM_POINTS: 12544
- OVERSAMPLE_RATIO: 3.0
- IMPORTANCE_SAMPLE_RATIO: 0.75
- TEST:
- SEMANTIC_ON: True
- INSTANCE_ON: True
- PANOPTIC_ON: True
- OVERLAP_THRESHOLD: 0.8
- OBJECT_MASK_THRESHOLD: 0.8
diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/ade20k/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_160k.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/ade20k/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_160k.yaml
deleted file mode 100644
index af03d4d..0000000
--- a/SeMask-FAPN/SeMask-Mask2Former/configs/ade20k/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_160k.yaml
+++ /dev/null
@@ -1,18 +0,0 @@
-_BASE_: ../maskformer2_R50_bs16_160k.yaml
-MODEL:
- BACKBONE:
- NAME: "D2SwinTransformer"
- SWIN:
- EMBED_DIM: 192
- DEPTHS: [2, 2, 18, 2]
- NUM_HEADS: [6, 12, 24, 48]
- WINDOW_SIZE: 12
- APE: False
- DROP_PATH_RATE: 0.3
- PATCH_NORM: True
- PRETRAIN_IMG_SIZE: 384
- WEIGHTS: "swin_large_patch4_window12_384_22k.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
- MASK_FORMER:
- NUM_OBJECT_QUERIES: 200
diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/Base-Cityscapes-InstanceSegmentation.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/Base-Cityscapes-InstanceSegmentation.yaml
deleted file mode 100644
index 28833e7..0000000
--- a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/Base-Cityscapes-InstanceSegmentation.yaml
+++ /dev/null
@@ -1,61 +0,0 @@
-MODEL:
- BACKBONE:
- FREEZE_AT: 0
- NAME: "build_resnet_backbone"
- WEIGHTS: "detectron2://ImageNetPretrained/torchvision/R-50.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
- RESNETS:
- DEPTH: 50
- STEM_TYPE: "basic" # not used
- STEM_OUT_CHANNELS: 64
- STRIDE_IN_1X1: False
- OUT_FEATURES: ["res2", "res3", "res4", "res5"]
- NORM: "SyncBN" # use syncbn for cityscapes dataset
- RES5_MULTI_GRID: [1, 1, 1] # not used
-DATASETS:
- TRAIN: ("cityscapes_fine_instance_seg_train",)
- TEST: ("cityscapes_fine_instance_seg_val",)
-SOLVER:
- IMS_PER_BATCH: 16
- BASE_LR: 0.0001
- MAX_ITER: 90000
- WARMUP_FACTOR: 1.0
- WARMUP_ITERS: 0
- WEIGHT_DECAY: 0.05
- OPTIMIZER: "ADAMW"
- LR_SCHEDULER_NAME: "WarmupPolyLR"
- BACKBONE_MULTIPLIER: 0.1
- CLIP_GRADIENTS:
- ENABLED: True
- CLIP_TYPE: "full_model"
- CLIP_VALUE: 0.01
- NORM_TYPE: 2.0
- AMP:
- ENABLED: True
-INPUT:
- MIN_SIZE_TRAIN: !!python/object/apply:eval ["[int(x * 0.1 * 1024) for x in range(5, 21)]"]
- MIN_SIZE_TRAIN_SAMPLING: "choice"
- MIN_SIZE_TEST: 1024
- MAX_SIZE_TRAIN: 4096
- MAX_SIZE_TEST: 2048
- CROP:
- ENABLED: True
- TYPE: "absolute"
- SIZE: (512, 1024)
- SINGLE_CATEGORY_MAX_AREA: 1.0
- COLOR_AUG_SSD: True
- SIZE_DIVISIBILITY: -1
- FORMAT: "RGB"
- DATASET_MAPPER_NAME: "mask_former_instance"
-TEST:
- EVAL_PERIOD: 5000
- AUG:
- ENABLED: False
- MIN_SIZES: [512, 768, 1024, 1280, 1536, 1792]
- MAX_SIZE: 4096
- FLIP: True
-DATALOADER:
- FILTER_EMPTY_ANNOTATIONS: True
- NUM_WORKERS: 4
-VERSION: 2
diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/maskformer2_R101_bs16_90k.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/maskformer2_R101_bs16_90k.yaml
deleted file mode 100644
index 1eb38da..0000000
--- a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/maskformer2_R101_bs16_90k.yaml
+++ /dev/null
@@ -1,11 +0,0 @@
-_BASE_: maskformer2_R50_bs16_90k.yaml
-MODEL:
- WEIGHTS: "R-101.pkl"
- RESNETS:
- DEPTH: 101
- STEM_TYPE: "basic" # not used
- STEM_OUT_CHANNELS: 64
- STRIDE_IN_1X1: False
- OUT_FEATURES: ["res2", "res3", "res4", "res5"]
- NORM: "SyncBN"
- RES5_MULTI_GRID: [1, 1, 1] # not used
diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/maskformer2_R50_bs16_90k.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/maskformer2_R50_bs16_90k.yaml
deleted file mode 100644
index 16b215b..0000000
--- a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/maskformer2_R50_bs16_90k.yaml
+++ /dev/null
@@ -1,44 +0,0 @@
-_BASE_: Base-Cityscapes-InstanceSegmentation.yaml
-MODEL:
- META_ARCHITECTURE: "MaskFormer"
- SEM_SEG_HEAD:
- NAME: "MaskFormerHead"
- IGNORE_VALUE: 255
- NUM_CLASSES: 8
- LOSS_WEIGHT: 1.0
- CONVS_DIM: 256
- MASK_DIM: 256
- NORM: "GN"
- # pixel decoder
- PIXEL_DECODER_NAME: "MSDeformAttnPixelDecoder"
- IN_FEATURES: ["res2", "res3", "res4", "res5"]
- DEFORMABLE_TRANSFORMER_ENCODER_IN_FEATURES: ["res3", "res4", "res5"]
- COMMON_STRIDE: 4
- TRANSFORMER_ENC_LAYERS: 6
- MASK_FORMER:
- TRANSFORMER_DECODER_NAME: "MultiScaleMaskedTransformerDecoder"
- TRANSFORMER_IN_FEATURE: "multi_scale_pixel_decoder"
- DEEP_SUPERVISION: True
- NO_OBJECT_WEIGHT: 0.1
- CLASS_WEIGHT: 2.0
- MASK_WEIGHT: 5.0
- DICE_WEIGHT: 5.0
- HIDDEN_DIM: 256
- NUM_OBJECT_QUERIES: 100
- NHEADS: 8
- DROPOUT: 0.0
- DIM_FEEDFORWARD: 2048
- ENC_LAYERS: 0
- PRE_NORM: False
- ENFORCE_INPUT_PROJ: False
- SIZE_DIVISIBILITY: 32
- DEC_LAYERS: 10 # 9 decoder layers, add one for the loss on learnable query
- TRAIN_NUM_POINTS: 12544
- OVERSAMPLE_RATIO: 3.0
- IMPORTANCE_SAMPLE_RATIO: 0.75
- TEST:
- SEMANTIC_ON: False
- INSTANCE_ON: True
- PANOPTIC_ON: False
- OVERLAP_THRESHOLD: 0.8
- OBJECT_MASK_THRESHOLD: 0.8
diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/swin/maskformer2_swin_base_IN21k_384_bs16_90k.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/swin/maskformer2_swin_base_IN21k_384_bs16_90k.yaml
deleted file mode 100644
index 2956571..0000000
--- a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/swin/maskformer2_swin_base_IN21k_384_bs16_90k.yaml
+++ /dev/null
@@ -1,16 +0,0 @@
-_BASE_: ../maskformer2_R50_bs16_90k.yaml
-MODEL:
- BACKBONE:
- NAME: "D2SwinTransformer"
- SWIN:
- EMBED_DIM: 128
- DEPTHS: [2, 2, 18, 2]
- NUM_HEADS: [4, 8, 16, 32]
- WINDOW_SIZE: 12
- APE: False
- DROP_PATH_RATE: 0.3
- PATCH_NORM: True
- PRETRAIN_IMG_SIZE: 384
- WEIGHTS: "swin_base_patch4_window12_384_22k.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_90k.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_90k.yaml
deleted file mode 100644
index 72860d9..0000000
--- a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_90k.yaml
+++ /dev/null
@@ -1,18 +0,0 @@
-_BASE_: ../maskformer2_R50_bs16_90k.yaml
-MODEL:
- BACKBONE:
- NAME: "D2SwinTransformer"
- SWIN:
- EMBED_DIM: 192
- DEPTHS: [2, 2, 18, 2]
- NUM_HEADS: [6, 12, 24, 48]
- WINDOW_SIZE: 12
- APE: False
- DROP_PATH_RATE: 0.3
- PATCH_NORM: True
- PRETRAIN_IMG_SIZE: 384
- WEIGHTS: "swin_large_patch4_window12_384_22k.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
- MASK_FORMER:
- NUM_OBJECT_QUERIES: 200
diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/swin/maskformer2_swin_small_bs16_90k.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/swin/maskformer2_swin_small_bs16_90k.yaml
deleted file mode 100644
index 156ef9e..0000000
--- a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/swin/maskformer2_swin_small_bs16_90k.yaml
+++ /dev/null
@@ -1,15 +0,0 @@
-_BASE_: ../maskformer2_R50_bs16_90k.yaml
-MODEL:
- BACKBONE:
- NAME: "D2SwinTransformer"
- SWIN:
- EMBED_DIM: 96
- DEPTHS: [2, 2, 18, 2]
- NUM_HEADS: [3, 6, 12, 24]
- WINDOW_SIZE: 7
- APE: False
- DROP_PATH_RATE: 0.3
- PATCH_NORM: True
- WEIGHTS: "swin_small_patch4_window7_224.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/swin/maskformer2_swin_tiny_bs16_90k.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/swin/maskformer2_swin_tiny_bs16_90k.yaml
deleted file mode 100644
index 0c56e2c..0000000
--- a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/swin/maskformer2_swin_tiny_bs16_90k.yaml
+++ /dev/null
@@ -1,15 +0,0 @@
-_BASE_: ../maskformer2_R50_bs16_90k.yaml
-MODEL:
- BACKBONE:
- NAME: "D2SwinTransformer"
- SWIN:
- EMBED_DIM: 96
- DEPTHS: [2, 2, 6, 2]
- NUM_HEADS: [3, 6, 12, 24]
- WINDOW_SIZE: 7
- APE: False
- DROP_PATH_RATE: 0.3
- PATCH_NORM: True
- WEIGHTS: "swin_tiny_patch4_window7_224.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/Base-Cityscapes-PanopticSegmentation.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/Base-Cityscapes-PanopticSegmentation.yaml
deleted file mode 100644
index 022567c..0000000
--- a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/Base-Cityscapes-PanopticSegmentation.yaml
+++ /dev/null
@@ -1,61 +0,0 @@
-MODEL:
- BACKBONE:
- FREEZE_AT: 0
- NAME: "build_resnet_backbone"
- WEIGHTS: "detectron2://ImageNetPretrained/torchvision/R-50.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
- RESNETS:
- DEPTH: 50
- STEM_TYPE: "basic" # not used
- STEM_OUT_CHANNELS: 64
- STRIDE_IN_1X1: False
- OUT_FEATURES: ["res2", "res3", "res4", "res5"]
- NORM: "SyncBN" # use syncbn for cityscapes dataset
- RES5_MULTI_GRID: [1, 1, 1] # not used
-DATASETS:
- TRAIN: ("cityscapes_fine_panoptic_train",)
- TEST: ("cityscapes_fine_panoptic_val",)
-SOLVER:
- IMS_PER_BATCH: 16
- BASE_LR: 0.0001
- MAX_ITER: 90000
- WARMUP_FACTOR: 1.0
- WARMUP_ITERS: 0
- WEIGHT_DECAY: 0.05
- OPTIMIZER: "ADAMW"
- LR_SCHEDULER_NAME: "WarmupPolyLR"
- BACKBONE_MULTIPLIER: 0.1
- CLIP_GRADIENTS:
- ENABLED: True
- CLIP_TYPE: "full_model"
- CLIP_VALUE: 0.01
- NORM_TYPE: 2.0
- AMP:
- ENABLED: True
-INPUT:
- MIN_SIZE_TRAIN: !!python/object/apply:eval ["[int(x * 0.1 * 1024) for x in range(5, 21)]"]
- MIN_SIZE_TRAIN_SAMPLING: "choice"
- MIN_SIZE_TEST: 1024
- MAX_SIZE_TRAIN: 4096
- MAX_SIZE_TEST: 2048
- CROP:
- ENABLED: True
- TYPE: "absolute"
- SIZE: (512, 1024)
- SINGLE_CATEGORY_MAX_AREA: 1.0
- COLOR_AUG_SSD: True
- SIZE_DIVISIBILITY: -1
- FORMAT: "RGB"
- DATASET_MAPPER_NAME: "mask_former_panoptic"
-TEST:
- EVAL_PERIOD: 5000
- AUG:
- ENABLED: False
- MIN_SIZES: [512, 768, 1024, 1280, 1536, 1792]
- MAX_SIZE: 4096
- FLIP: True
-DATALOADER:
- FILTER_EMPTY_ANNOTATIONS: True
- NUM_WORKERS: 4
-VERSION: 2
diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/maskformer2_R101_bs16_90k.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/maskformer2_R101_bs16_90k.yaml
deleted file mode 100644
index 1eb38da..0000000
--- a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/maskformer2_R101_bs16_90k.yaml
+++ /dev/null
@@ -1,11 +0,0 @@
-_BASE_: maskformer2_R50_bs16_90k.yaml
-MODEL:
- WEIGHTS: "R-101.pkl"
- RESNETS:
- DEPTH: 101
- STEM_TYPE: "basic" # not used
- STEM_OUT_CHANNELS: 64
- STRIDE_IN_1X1: False
- OUT_FEATURES: ["res2", "res3", "res4", "res5"]
- NORM: "SyncBN"
- RES5_MULTI_GRID: [1, 1, 1] # not used
diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/maskformer2_R50_bs16_90k.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/maskformer2_R50_bs16_90k.yaml
deleted file mode 100644
index 3c2d679..0000000
--- a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/maskformer2_R50_bs16_90k.yaml
+++ /dev/null
@@ -1,44 +0,0 @@
-_BASE_: Base-Cityscapes-PanopticSegmentation.yaml
-MODEL:
- META_ARCHITECTURE: "MaskFormer"
- SEM_SEG_HEAD:
- NAME: "MaskFormerHead"
- IGNORE_VALUE: 255
- NUM_CLASSES: 19
- LOSS_WEIGHT: 1.0
- CONVS_DIM: 256
- MASK_DIM: 256
- NORM: "GN"
- # pixel decoder
- PIXEL_DECODER_NAME: "MSDeformAttnPixelDecoder"
- IN_FEATURES: ["res2", "res3", "res4", "res5"]
- DEFORMABLE_TRANSFORMER_ENCODER_IN_FEATURES: ["res3", "res4", "res5"]
- COMMON_STRIDE: 4
- TRANSFORMER_ENC_LAYERS: 6
- MASK_FORMER:
- TRANSFORMER_DECODER_NAME: "MultiScaleMaskedTransformerDecoder"
- TRANSFORMER_IN_FEATURE: "multi_scale_pixel_decoder"
- DEEP_SUPERVISION: True
- NO_OBJECT_WEIGHT: 0.1
- CLASS_WEIGHT: 2.0
- MASK_WEIGHT: 5.0
- DICE_WEIGHT: 5.0
- HIDDEN_DIM: 256
- NUM_OBJECT_QUERIES: 100
- NHEADS: 8
- DROPOUT: 0.0
- DIM_FEEDFORWARD: 2048
- ENC_LAYERS: 0
- PRE_NORM: False
- ENFORCE_INPUT_PROJ: False
- SIZE_DIVISIBILITY: 32
- DEC_LAYERS: 10 # 9 decoder layers, add one for the loss on learnable query
- TRAIN_NUM_POINTS: 12544
- OVERSAMPLE_RATIO: 3.0
- IMPORTANCE_SAMPLE_RATIO: 0.75
- TEST:
- SEMANTIC_ON: True
- INSTANCE_ON: True
- PANOPTIC_ON: True
- OVERLAP_THRESHOLD: 0.8
- OBJECT_MASK_THRESHOLD: 0.8
diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/swin/maskformer2_swin_base_IN21k_384_bs16_90k.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/swin/maskformer2_swin_base_IN21k_384_bs16_90k.yaml
deleted file mode 100644
index 2956571..0000000
--- a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/swin/maskformer2_swin_base_IN21k_384_bs16_90k.yaml
+++ /dev/null
@@ -1,16 +0,0 @@
-_BASE_: ../maskformer2_R50_bs16_90k.yaml
-MODEL:
- BACKBONE:
- NAME: "D2SwinTransformer"
- SWIN:
- EMBED_DIM: 128
- DEPTHS: [2, 2, 18, 2]
- NUM_HEADS: [4, 8, 16, 32]
- WINDOW_SIZE: 12
- APE: False
- DROP_PATH_RATE: 0.3
- PATCH_NORM: True
- PRETRAIN_IMG_SIZE: 384
- WEIGHTS: "swin_base_patch4_window12_384_22k.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_90k.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_90k.yaml
deleted file mode 100644
index 72860d9..0000000
--- a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_90k.yaml
+++ /dev/null
@@ -1,18 +0,0 @@
-_BASE_: ../maskformer2_R50_bs16_90k.yaml
-MODEL:
- BACKBONE:
- NAME: "D2SwinTransformer"
- SWIN:
- EMBED_DIM: 192
- DEPTHS: [2, 2, 18, 2]
- NUM_HEADS: [6, 12, 24, 48]
- WINDOW_SIZE: 12
- APE: False
- DROP_PATH_RATE: 0.3
- PATCH_NORM: True
- PRETRAIN_IMG_SIZE: 384
- WEIGHTS: "swin_large_patch4_window12_384_22k.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
- MASK_FORMER:
- NUM_OBJECT_QUERIES: 200
diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/swin/maskformer2_swin_small_bs16_90k.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/swin/maskformer2_swin_small_bs16_90k.yaml
deleted file mode 100644
index 156ef9e..0000000
--- a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/swin/maskformer2_swin_small_bs16_90k.yaml
+++ /dev/null
@@ -1,15 +0,0 @@
-_BASE_: ../maskformer2_R50_bs16_90k.yaml
-MODEL:
- BACKBONE:
- NAME: "D2SwinTransformer"
- SWIN:
- EMBED_DIM: 96
- DEPTHS: [2, 2, 18, 2]
- NUM_HEADS: [3, 6, 12, 24]
- WINDOW_SIZE: 7
- APE: False
- DROP_PATH_RATE: 0.3
- PATCH_NORM: True
- WEIGHTS: "swin_small_patch4_window7_224.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/swin/maskformer2_swin_tiny_bs16_90k.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/swin/maskformer2_swin_tiny_bs16_90k.yaml
deleted file mode 100644
index 0c56e2c..0000000
--- a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/swin/maskformer2_swin_tiny_bs16_90k.yaml
+++ /dev/null
@@ -1,15 +0,0 @@
-_BASE_: ../maskformer2_R50_bs16_90k.yaml
-MODEL:
- BACKBONE:
- NAME: "D2SwinTransformer"
- SWIN:
- EMBED_DIM: 96
- DEPTHS: [2, 2, 6, 2]
- NUM_HEADS: [3, 6, 12, 24]
- WINDOW_SIZE: 7
- APE: False
- DROP_PATH_RATE: 0.3
- PATCH_NORM: True
- WEIGHTS: "swin_tiny_patch4_window7_224.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/semantic-segmentation/Base-Cityscapes-SemanticSegmentation.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/semantic-segmentation/Base-Cityscapes-SemanticSegmentation.yaml
deleted file mode 100644
index ca42fab..0000000
--- a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/semantic-segmentation/Base-Cityscapes-SemanticSegmentation.yaml
+++ /dev/null
@@ -1,61 +0,0 @@
-MODEL:
- BACKBONE:
- FREEZE_AT: 0
- NAME: "build_resnet_backbone"
- WEIGHTS: "detectron2://ImageNetPretrained/torchvision/R-50.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
- RESNETS:
- DEPTH: 50
- STEM_TYPE: "basic" # not used
- STEM_OUT_CHANNELS: 64
- STRIDE_IN_1X1: False
- OUT_FEATURES: ["res2", "res3", "res4", "res5"]
- NORM: "SyncBN" # use syncbn for cityscapes dataset
- RES5_MULTI_GRID: [1, 1, 1] # not used
-DATASETS:
- TRAIN: ("cityscapes_fine_sem_seg_train",)
- TEST: ("cityscapes_fine_sem_seg_val",)
-SOLVER:
- IMS_PER_BATCH: 16
- BASE_LR: 0.0001
- MAX_ITER: 90000
- WARMUP_FACTOR: 1.0
- WARMUP_ITERS: 0
- WEIGHT_DECAY: 0.05
- OPTIMIZER: "ADAMW"
- LR_SCHEDULER_NAME: "WarmupPolyLR"
- BACKBONE_MULTIPLIER: 0.1
- CLIP_GRADIENTS:
- ENABLED: True
- CLIP_TYPE: "full_model"
- CLIP_VALUE: 0.01
- NORM_TYPE: 2.0
- AMP:
- ENABLED: True
-INPUT:
- MIN_SIZE_TRAIN: !!python/object/apply:eval ["[int(x * 0.1 * 1024) for x in range(5, 21)]"]
- MIN_SIZE_TRAIN_SAMPLING: "choice"
- MIN_SIZE_TEST: 1024
- MAX_SIZE_TRAIN: 4096
- MAX_SIZE_TEST: 2048
- CROP:
- ENABLED: True
- TYPE: "absolute"
- SIZE: (512, 1024)
- SINGLE_CATEGORY_MAX_AREA: 1.0
- COLOR_AUG_SSD: True
- SIZE_DIVISIBILITY: -1
- FORMAT: "RGB"
- DATASET_MAPPER_NAME: "mask_former_semantic"
-TEST:
- EVAL_PERIOD: 5000
- AUG:
- ENABLED: False
- MIN_SIZES: [512, 768, 1024, 1280, 1536, 1792]
- MAX_SIZE: 4096
- FLIP: True
-DATALOADER:
- FILTER_EMPTY_ANNOTATIONS: True
- NUM_WORKERS: 4
-VERSION: 2
diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/semantic-segmentation/maskformer2_R101_bs16_90k.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/semantic-segmentation/maskformer2_R101_bs16_90k.yaml
deleted file mode 100644
index 1eb38da..0000000
--- a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/semantic-segmentation/maskformer2_R101_bs16_90k.yaml
+++ /dev/null
@@ -1,11 +0,0 @@
-_BASE_: maskformer2_R50_bs16_90k.yaml
-MODEL:
- WEIGHTS: "R-101.pkl"
- RESNETS:
- DEPTH: 101
- STEM_TYPE: "basic" # not used
- STEM_OUT_CHANNELS: 64
- STRIDE_IN_1X1: False
- OUT_FEATURES: ["res2", "res3", "res4", "res5"]
- NORM: "SyncBN"
- RES5_MULTI_GRID: [1, 1, 1] # not used
diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/semantic-segmentation/maskformer2_R50_bs16_90k.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/semantic-segmentation/maskformer2_R50_bs16_90k.yaml
deleted file mode 100644
index d872fcd..0000000
--- a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/semantic-segmentation/maskformer2_R50_bs16_90k.yaml
+++ /dev/null
@@ -1,44 +0,0 @@
-_BASE_: Base-Cityscapes-SemanticSegmentation.yaml
-MODEL:
- META_ARCHITECTURE: "MaskFormer"
- SEM_SEG_HEAD:
- NAME: "MaskFormerHead"
- IGNORE_VALUE: 255
- NUM_CLASSES: 19
- LOSS_WEIGHT: 1.0
- CONVS_DIM: 256
- MASK_DIM: 256
- NORM: "GN"
- # pixel decoder
- PIXEL_DECODER_NAME: "MSDeformAttnPixelDecoder"
- IN_FEATURES: ["res2", "res3", "res4", "res5"]
- DEFORMABLE_TRANSFORMER_ENCODER_IN_FEATURES: ["res3", "res4", "res5"]
- COMMON_STRIDE: 4
- TRANSFORMER_ENC_LAYERS: 6
- MASK_FORMER:
- TRANSFORMER_DECODER_NAME: "MultiScaleMaskedTransformerDecoder"
- TRANSFORMER_IN_FEATURE: "multi_scale_pixel_decoder"
- DEEP_SUPERVISION: True
- NO_OBJECT_WEIGHT: 0.1
- CLASS_WEIGHT: 2.0
- MASK_WEIGHT: 5.0
- DICE_WEIGHT: 5.0
- HIDDEN_DIM: 256
- NUM_OBJECT_QUERIES: 100
- NHEADS: 8
- DROPOUT: 0.0
- DIM_FEEDFORWARD: 2048
- ENC_LAYERS: 0
- PRE_NORM: False
- ENFORCE_INPUT_PROJ: False
- SIZE_DIVISIBILITY: 32
- DEC_LAYERS: 10 # 9 decoder layers, add one for the loss on learnable query
- TRAIN_NUM_POINTS: 12544
- OVERSAMPLE_RATIO: 3.0
- IMPORTANCE_SAMPLE_RATIO: 0.75
- TEST:
- SEMANTIC_ON: True
- INSTANCE_ON: False
- PANOPTIC_ON: False
- OVERLAP_THRESHOLD: 0.8
- OBJECT_MASK_THRESHOLD: 0.8
diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/semantic-segmentation/semask_swin/maskformer2_semask_swin_large_IN21k_384_bs16_90k.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/semantic-segmentation/semask_swin/maskformer2_semask_swin_large_IN21k_384_bs16_90k.yaml
deleted file mode 100644
index 9dca3c8..0000000
--- a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/semantic-segmentation/semask_swin/maskformer2_semask_swin_large_IN21k_384_bs16_90k.yaml
+++ /dev/null
@@ -1,18 +0,0 @@
-_BASE_: semask_maskformer2_R50_bs16_90k.yaml
-MODEL:
- BACKBONE:
- NAME: "D2SeMaskSwinTransformer"
- SWIN:
- EMBED_DIM: 192
- DEPTHS: [2, 2, 18, 2]
- NUM_HEADS: [6, 12, 24, 48]
- WINDOW_SIZE: 12
- APE: False
- DROP_PATH_RATE: 0.3
- PATCH_NORM: True
- PRETRAIN_IMG_SIZE: 384
- WEIGHTS: "swin_large_patch4_window12_384_22kto1k.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
- MASK_FORMER:
- NUM_OBJECT_QUERIES: 100
diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/semantic-segmentation/semask_swin/semask_maskformer2_R50_bs16_90k.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/semantic-segmentation/semask_swin/semask_maskformer2_R50_bs16_90k.yaml
deleted file mode 100644
index 88a53c6..0000000
--- a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/semantic-segmentation/semask_swin/semask_maskformer2_R50_bs16_90k.yaml
+++ /dev/null
@@ -1,44 +0,0 @@
-_BASE_: ../Base-Cityscapes-SemanticSegmentation.yaml
-MODEL:
- META_ARCHITECTURE: "SeMaskMaskFormer"
- SEM_SEG_HEAD:
- NAME: "BranchMaskFormerHead"
- IGNORE_VALUE: 255
- NUM_CLASSES: 19
- LOSS_WEIGHT: 1.0
- CONVS_DIM: 256
- MASK_DIM: 256
- NORM: "GN"
- # pixel decoder
- PIXEL_DECODER_NAME: "MSDeformAttnPixelDecoder"
- IN_FEATURES: ["res2", "res3", "res4", "res5"]
- DEFORMABLE_TRANSFORMER_ENCODER_IN_FEATURES: ["res3", "res4", "res5"]
- COMMON_STRIDE: 4
- TRANSFORMER_ENC_LAYERS: 6
- MASK_FORMER:
- TRANSFORMER_DECODER_NAME: "MultiScaleMaskedTransformerDecoder"
- TRANSFORMER_IN_FEATURE: "multi_scale_pixel_decoder"
- DEEP_SUPERVISION: True
- NO_OBJECT_WEIGHT: 0.1
- CLASS_WEIGHT: 2.0
- MASK_WEIGHT: 5.0
- DICE_WEIGHT: 5.0
- HIDDEN_DIM: 256
- NUM_OBJECT_QUERIES: 100
- NHEADS: 8
- DROPOUT: 0.0
- DIM_FEEDFORWARD: 2048
- ENC_LAYERS: 0
- PRE_NORM: False
- ENFORCE_INPUT_PROJ: False
- SIZE_DIVISIBILITY: 32
- DEC_LAYERS: 10 # 9 decoder layers, add one for the loss on learnable query
- TRAIN_NUM_POINTS: 12544
- OVERSAMPLE_RATIO: 3.0
- IMPORTANCE_SAMPLE_RATIO: 0.75
- TEST:
- SEMANTIC_ON: True
- INSTANCE_ON: False
- PANOPTIC_ON: False
- OVERLAP_THRESHOLD: 0.8
- OBJECT_MASK_THRESHOLD: 0.8
diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/semantic-segmentation/swin/maskformer2_swin_base_IN21k_384_bs16_90k.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/semantic-segmentation/swin/maskformer2_swin_base_IN21k_384_bs16_90k.yaml
deleted file mode 100644
index 2956571..0000000
--- a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/semantic-segmentation/swin/maskformer2_swin_base_IN21k_384_bs16_90k.yaml
+++ /dev/null
@@ -1,16 +0,0 @@
-_BASE_: ../maskformer2_R50_bs16_90k.yaml
-MODEL:
- BACKBONE:
- NAME: "D2SwinTransformer"
- SWIN:
- EMBED_DIM: 128
- DEPTHS: [2, 2, 18, 2]
- NUM_HEADS: [4, 8, 16, 32]
- WINDOW_SIZE: 12
- APE: False
- DROP_PATH_RATE: 0.3
- PATCH_NORM: True
- PRETRAIN_IMG_SIZE: 384
- WEIGHTS: "swin_base_patch4_window12_384_22k.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/semantic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_90k.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/semantic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_90k.yaml
deleted file mode 100644
index 2509717..0000000
--- a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/semantic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_90k.yaml
+++ /dev/null
@@ -1,18 +0,0 @@
-_BASE_: ../maskformer2_R50_bs16_90k.yaml
-MODEL:
- BACKBONE:
- NAME: "D2SwinTransformer"
- SWIN:
- EMBED_DIM: 192
- DEPTHS: [2, 2, 18, 2]
- NUM_HEADS: [6, 12, 24, 48]
- WINDOW_SIZE: 12
- APE: False
- DROP_PATH_RATE: 0.3
- PATCH_NORM: True
- PRETRAIN_IMG_SIZE: 384
- WEIGHTS: "swin_large_patch4_window12_384_22k.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
- MASK_FORMER:
- NUM_OBJECT_QUERIES: 100
diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/semantic-segmentation/swin/maskformer2_swin_small_bs16_90k.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/semantic-segmentation/swin/maskformer2_swin_small_bs16_90k.yaml
deleted file mode 100644
index 156ef9e..0000000
--- a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/semantic-segmentation/swin/maskformer2_swin_small_bs16_90k.yaml
+++ /dev/null
@@ -1,15 +0,0 @@
-_BASE_: ../maskformer2_R50_bs16_90k.yaml
-MODEL:
- BACKBONE:
- NAME: "D2SwinTransformer"
- SWIN:
- EMBED_DIM: 96
- DEPTHS: [2, 2, 18, 2]
- NUM_HEADS: [3, 6, 12, 24]
- WINDOW_SIZE: 7
- APE: False
- DROP_PATH_RATE: 0.3
- PATCH_NORM: True
- WEIGHTS: "swin_small_patch4_window7_224.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/semantic-segmentation/swin/maskformer2_swin_tiny_bs16_90k.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/semantic-segmentation/swin/maskformer2_swin_tiny_bs16_90k.yaml
deleted file mode 100644
index 0c56e2c..0000000
--- a/SeMask-FAPN/SeMask-Mask2Former/configs/cityscapes/semantic-segmentation/swin/maskformer2_swin_tiny_bs16_90k.yaml
+++ /dev/null
@@ -1,15 +0,0 @@
-_BASE_: ../maskformer2_R50_bs16_90k.yaml
-MODEL:
- BACKBONE:
- NAME: "D2SwinTransformer"
- SWIN:
- EMBED_DIM: 96
- DEPTHS: [2, 2, 6, 2]
- NUM_HEADS: [3, 6, 12, 24]
- WINDOW_SIZE: 7
- APE: False
- DROP_PATH_RATE: 0.3
- PATCH_NORM: True
- WEIGHTS: "swin_tiny_patch4_window7_224.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/instance-segmentation/Base-COCO-InstanceSegmentation.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/coco/instance-segmentation/Base-COCO-InstanceSegmentation.yaml
deleted file mode 100644
index 98943d9..0000000
--- a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/instance-segmentation/Base-COCO-InstanceSegmentation.yaml
+++ /dev/null
@@ -1,47 +0,0 @@
-MODEL:
- BACKBONE:
- FREEZE_AT: 0
- NAME: "build_resnet_backbone"
- WEIGHTS: "detectron2://ImageNetPretrained/torchvision/R-50.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
- RESNETS:
- DEPTH: 50
- STEM_TYPE: "basic" # not used
- STEM_OUT_CHANNELS: 64
- STRIDE_IN_1X1: False
- OUT_FEATURES: ["res2", "res3", "res4", "res5"]
- # NORM: "SyncBN"
- RES5_MULTI_GRID: [1, 1, 1] # not used
-DATASETS:
- TRAIN: ("coco_2017_train",)
- TEST: ("coco_2017_val",)
-SOLVER:
- IMS_PER_BATCH: 16
- BASE_LR: 0.0001
- STEPS: (327778, 355092)
- MAX_ITER: 368750
- WARMUP_FACTOR: 1.0
- WARMUP_ITERS: 10
- WEIGHT_DECAY: 0.05
- OPTIMIZER: "ADAMW"
- BACKBONE_MULTIPLIER: 0.1
- CLIP_GRADIENTS:
- ENABLED: True
- CLIP_TYPE: "full_model"
- CLIP_VALUE: 0.01
- NORM_TYPE: 2.0
- AMP:
- ENABLED: True
-INPUT:
- IMAGE_SIZE: 1024
- MIN_SCALE: 0.1
- MAX_SCALE: 2.0
- FORMAT: "RGB"
- DATASET_MAPPER_NAME: "coco_instance_lsj"
-TEST:
- EVAL_PERIOD: 5000
-DATALOADER:
- FILTER_EMPTY_ANNOTATIONS: True
- NUM_WORKERS: 4
-VERSION: 2
diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/instance-segmentation/maskformer2_R101_bs16_50ep.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/coco/instance-segmentation/maskformer2_R101_bs16_50ep.yaml
deleted file mode 100644
index 77defd0..0000000
--- a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/instance-segmentation/maskformer2_R101_bs16_50ep.yaml
+++ /dev/null
@@ -1,11 +0,0 @@
-_BASE_: maskformer2_R50_bs16_50ep.yaml
-MODEL:
- WEIGHTS: "R-101.pkl"
- RESNETS:
- DEPTH: 101
- STEM_TYPE: "basic" # not used
- STEM_OUT_CHANNELS: 64
- STRIDE_IN_1X1: False
- OUT_FEATURES: ["res2", "res3", "res4", "res5"]
- # NORM: "SyncBN"
- RES5_MULTI_GRID: [1, 1, 1] # not used
diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/instance-segmentation/maskformer2_R50_bs16_50ep.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/coco/instance-segmentation/maskformer2_R50_bs16_50ep.yaml
deleted file mode 100644
index 4b9e76e..0000000
--- a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/instance-segmentation/maskformer2_R50_bs16_50ep.yaml
+++ /dev/null
@@ -1,44 +0,0 @@
-_BASE_: Base-COCO-InstanceSegmentation.yaml
-MODEL:
- META_ARCHITECTURE: "MaskFormer"
- SEM_SEG_HEAD:
- NAME: "MaskFormerHead"
- IGNORE_VALUE: 255
- NUM_CLASSES: 80
- LOSS_WEIGHT: 1.0
- CONVS_DIM: 256
- MASK_DIM: 256
- NORM: "GN"
- # pixel decoder
- PIXEL_DECODER_NAME: "MSDeformAttnPixelDecoder"
- IN_FEATURES: ["res2", "res3", "res4", "res5"]
- DEFORMABLE_TRANSFORMER_ENCODER_IN_FEATURES: ["res3", "res4", "res5"]
- COMMON_STRIDE: 4
- TRANSFORMER_ENC_LAYERS: 6
- MASK_FORMER:
- TRANSFORMER_DECODER_NAME: "MultiScaleMaskedTransformerDecoder"
- TRANSFORMER_IN_FEATURE: "multi_scale_pixel_decoder"
- DEEP_SUPERVISION: True
- NO_OBJECT_WEIGHT: 0.1
- CLASS_WEIGHT: 2.0
- MASK_WEIGHT: 5.0
- DICE_WEIGHT: 5.0
- HIDDEN_DIM: 256
- NUM_OBJECT_QUERIES: 100
- NHEADS: 8
- DROPOUT: 0.0
- DIM_FEEDFORWARD: 2048
- ENC_LAYERS: 0
- PRE_NORM: False
- ENFORCE_INPUT_PROJ: False
- SIZE_DIVISIBILITY: 32
- DEC_LAYERS: 10 # 9 decoder layers, add one for the loss on learnable query
- TRAIN_NUM_POINTS: 12544
- OVERSAMPLE_RATIO: 3.0
- IMPORTANCE_SAMPLE_RATIO: 0.75
- TEST:
- SEMANTIC_ON: False
- INSTANCE_ON: True
- PANOPTIC_ON: False
- OVERLAP_THRESHOLD: 0.8
- OBJECT_MASK_THRESHOLD: 0.8
diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_base_384_bs16_50ep.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_base_384_bs16_50ep.yaml
deleted file mode 100644
index 4732999..0000000
--- a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_base_384_bs16_50ep.yaml
+++ /dev/null
@@ -1,16 +0,0 @@
-_BASE_: ../maskformer2_R50_bs16_50ep.yaml
-MODEL:
- BACKBONE:
- NAME: "D2SwinTransformer"
- SWIN:
- EMBED_DIM: 128
- DEPTHS: [2, 2, 18, 2]
- NUM_HEADS: [4, 8, 16, 32]
- WINDOW_SIZE: 12
- APE: False
- DROP_PATH_RATE: 0.3
- PATCH_NORM: True
- PRETRAIN_IMG_SIZE: 384
- WEIGHTS: "swin_base_patch4_window12_384.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_base_IN21k_384_bs16_50ep.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_base_IN21k_384_bs16_50ep.yaml
deleted file mode 100644
index 5dde960..0000000
--- a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_base_IN21k_384_bs16_50ep.yaml
+++ /dev/null
@@ -1,16 +0,0 @@
-_BASE_: ../maskformer2_R50_bs16_50ep.yaml
-MODEL:
- BACKBONE:
- NAME: "D2SwinTransformer"
- SWIN:
- EMBED_DIM: 128
- DEPTHS: [2, 2, 18, 2]
- NUM_HEADS: [4, 8, 16, 32]
- WINDOW_SIZE: 12
- APE: False
- DROP_PATH_RATE: 0.3
- PATCH_NORM: True
- PRETRAIN_IMG_SIZE: 384
- WEIGHTS: "swin_base_patch4_window12_384_22k.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_100ep.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_100ep.yaml
deleted file mode 100644
index b685cdb..0000000
--- a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_100ep.yaml
+++ /dev/null
@@ -1,21 +0,0 @@
-_BASE_: ../maskformer2_R50_bs16_50ep.yaml
-MODEL:
- BACKBONE:
- NAME: "D2SwinTransformer"
- SWIN:
- EMBED_DIM: 192
- DEPTHS: [2, 2, 18, 2]
- NUM_HEADS: [6, 12, 24, 48]
- WINDOW_SIZE: 12
- APE: False
- DROP_PATH_RATE: 0.3
- PATCH_NORM: True
- PRETRAIN_IMG_SIZE: 384
- WEIGHTS: "swin_large_patch4_window12_384_22k.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
- MASK_FORMER:
- NUM_OBJECT_QUERIES: 200
-SOLVER:
- STEPS: (655556, 710184)
- MAX_ITER: 737500
diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_small_bs16_50ep.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_small_bs16_50ep.yaml
deleted file mode 100644
index f9b1c56..0000000
--- a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_small_bs16_50ep.yaml
+++ /dev/null
@@ -1,15 +0,0 @@
-_BASE_: ../maskformer2_R50_bs16_50ep.yaml
-MODEL:
- BACKBONE:
- NAME: "D2SwinTransformer"
- SWIN:
- EMBED_DIM: 96
- DEPTHS: [2, 2, 18, 2]
- NUM_HEADS: [3, 6, 12, 24]
- WINDOW_SIZE: 7
- APE: False
- DROP_PATH_RATE: 0.3
- PATCH_NORM: True
- WEIGHTS: "swin_small_patch4_window7_224.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_tiny_bs16_50ep.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_tiny_bs16_50ep.yaml
deleted file mode 100644
index 7f27bc5..0000000
--- a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_tiny_bs16_50ep.yaml
+++ /dev/null
@@ -1,15 +0,0 @@
-_BASE_: ../maskformer2_R50_bs16_50ep.yaml
-MODEL:
- BACKBONE:
- NAME: "D2SwinTransformer"
- SWIN:
- EMBED_DIM: 96
- DEPTHS: [2, 2, 6, 2]
- NUM_HEADS: [3, 6, 12, 24]
- WINDOW_SIZE: 7
- APE: False
- DROP_PATH_RATE: 0.3
- PATCH_NORM: True
- WEIGHTS: "swin_tiny_patch4_window7_224.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/panoptic-segmentation/Base-COCO-PanopticSegmentation.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/coco/panoptic-segmentation/Base-COCO-PanopticSegmentation.yaml
deleted file mode 100644
index 7560a73..0000000
--- a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/panoptic-segmentation/Base-COCO-PanopticSegmentation.yaml
+++ /dev/null
@@ -1,47 +0,0 @@
-MODEL:
- BACKBONE:
- FREEZE_AT: 0
- NAME: "build_resnet_backbone"
- WEIGHTS: "detectron2://ImageNetPretrained/torchvision/R-50.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
- RESNETS:
- DEPTH: 50
- STEM_TYPE: "basic" # not used
- STEM_OUT_CHANNELS: 64
- STRIDE_IN_1X1: False
- OUT_FEATURES: ["res2", "res3", "res4", "res5"]
- # NORM: "SyncBN"
- RES5_MULTI_GRID: [1, 1, 1] # not used
-DATASETS:
- TRAIN: ("coco_2017_train_panoptic",)
- TEST: ("coco_2017_val_panoptic_with_sem_seg",) # to evaluate instance and semantic performance as well
-SOLVER:
- IMS_PER_BATCH: 16
- BASE_LR: 0.0001
- STEPS: (327778, 355092)
- MAX_ITER: 368750
- WARMUP_FACTOR: 1.0
- WARMUP_ITERS: 10
- WEIGHT_DECAY: 0.05
- OPTIMIZER: "ADAMW"
- BACKBONE_MULTIPLIER: 0.1
- CLIP_GRADIENTS:
- ENABLED: True
- CLIP_TYPE: "full_model"
- CLIP_VALUE: 0.01
- NORM_TYPE: 2.0
- AMP:
- ENABLED: True
-INPUT:
- IMAGE_SIZE: 1024
- MIN_SCALE: 0.1
- MAX_SCALE: 2.0
- FORMAT: "RGB"
- DATASET_MAPPER_NAME: "coco_panoptic_lsj"
-TEST:
- EVAL_PERIOD: 5000
-DATALOADER:
- FILTER_EMPTY_ANNOTATIONS: True
- NUM_WORKERS: 4
-VERSION: 2
diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/panoptic-segmentation/maskformer2_R101_bs16_50ep.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/coco/panoptic-segmentation/maskformer2_R101_bs16_50ep.yaml
deleted file mode 100644
index 77defd0..0000000
--- a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/panoptic-segmentation/maskformer2_R101_bs16_50ep.yaml
+++ /dev/null
@@ -1,11 +0,0 @@
-_BASE_: maskformer2_R50_bs16_50ep.yaml
-MODEL:
- WEIGHTS: "R-101.pkl"
- RESNETS:
- DEPTH: 101
- STEM_TYPE: "basic" # not used
- STEM_OUT_CHANNELS: 64
- STRIDE_IN_1X1: False
- OUT_FEATURES: ["res2", "res3", "res4", "res5"]
- # NORM: "SyncBN"
- RES5_MULTI_GRID: [1, 1, 1] # not used
diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/panoptic-segmentation/maskformer2_R50_bs16_50ep.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/coco/panoptic-segmentation/maskformer2_R50_bs16_50ep.yaml
deleted file mode 100644
index 9ebf4f1..0000000
--- a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/panoptic-segmentation/maskformer2_R50_bs16_50ep.yaml
+++ /dev/null
@@ -1,45 +0,0 @@
-_BASE_: Base-COCO-PanopticSegmentation.yaml
-MODEL:
- META_ARCHITECTURE: "MaskFormer"
- SEM_SEG_HEAD:
- NAME: "MaskFormerHead"
- IN_FEATURES: ["res2", "res3", "res4", "res5"]
- IGNORE_VALUE: 255
- NUM_CLASSES: 133
- LOSS_WEIGHT: 1.0
- CONVS_DIM: 256
- MASK_DIM: 256
- NORM: "GN"
- # pixel decoder
- PIXEL_DECODER_NAME: "MSDeformAttnPixelDecoder"
- IN_FEATURES: ["res2", "res3", "res4", "res5"]
- DEFORMABLE_TRANSFORMER_ENCODER_IN_FEATURES: ["res3", "res4", "res5"]
- COMMON_STRIDE: 4
- TRANSFORMER_ENC_LAYERS: 6
- MASK_FORMER:
- TRANSFORMER_DECODER_NAME: "MultiScaleMaskedTransformerDecoder"
- TRANSFORMER_IN_FEATURE: "multi_scale_pixel_decoder"
- DEEP_SUPERVISION: True
- NO_OBJECT_WEIGHT: 0.1
- CLASS_WEIGHT: 2.0
- MASK_WEIGHT: 5.0
- DICE_WEIGHT: 5.0
- HIDDEN_DIM: 256
- NUM_OBJECT_QUERIES: 100
- NHEADS: 8
- DROPOUT: 0.0
- DIM_FEEDFORWARD: 2048
- ENC_LAYERS: 0
- PRE_NORM: False
- ENFORCE_INPUT_PROJ: False
- SIZE_DIVISIBILITY: 32
- DEC_LAYERS: 10 # 9 decoder layers, add one for the loss on learnable query
- TRAIN_NUM_POINTS: 12544
- OVERSAMPLE_RATIO: 3.0
- IMPORTANCE_SAMPLE_RATIO: 0.75
- TEST:
- SEMANTIC_ON: True
- INSTANCE_ON: True
- PANOPTIC_ON: True
- OVERLAP_THRESHOLD: 0.8
- OBJECT_MASK_THRESHOLD: 0.8
diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_base_384_bs16_50ep.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_base_384_bs16_50ep.yaml
deleted file mode 100644
index 4732999..0000000
--- a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_base_384_bs16_50ep.yaml
+++ /dev/null
@@ -1,16 +0,0 @@
-_BASE_: ../maskformer2_R50_bs16_50ep.yaml
-MODEL:
- BACKBONE:
- NAME: "D2SwinTransformer"
- SWIN:
- EMBED_DIM: 128
- DEPTHS: [2, 2, 18, 2]
- NUM_HEADS: [4, 8, 16, 32]
- WINDOW_SIZE: 12
- APE: False
- DROP_PATH_RATE: 0.3
- PATCH_NORM: True
- PRETRAIN_IMG_SIZE: 384
- WEIGHTS: "swin_base_patch4_window12_384.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_base_IN21k_384_bs16_50ep.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_base_IN21k_384_bs16_50ep.yaml
deleted file mode 100644
index 5dde960..0000000
--- a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_base_IN21k_384_bs16_50ep.yaml
+++ /dev/null
@@ -1,16 +0,0 @@
-_BASE_: ../maskformer2_R50_bs16_50ep.yaml
-MODEL:
- BACKBONE:
- NAME: "D2SwinTransformer"
- SWIN:
- EMBED_DIM: 128
- DEPTHS: [2, 2, 18, 2]
- NUM_HEADS: [4, 8, 16, 32]
- WINDOW_SIZE: 12
- APE: False
- DROP_PATH_RATE: 0.3
- PATCH_NORM: True
- PRETRAIN_IMG_SIZE: 384
- WEIGHTS: "swin_base_patch4_window12_384_22k.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_100ep.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_100ep.yaml
deleted file mode 100644
index b685cdb..0000000
--- a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_100ep.yaml
+++ /dev/null
@@ -1,21 +0,0 @@
-_BASE_: ../maskformer2_R50_bs16_50ep.yaml
-MODEL:
- BACKBONE:
- NAME: "D2SwinTransformer"
- SWIN:
- EMBED_DIM: 192
- DEPTHS: [2, 2, 18, 2]
- NUM_HEADS: [6, 12, 24, 48]
- WINDOW_SIZE: 12
- APE: False
- DROP_PATH_RATE: 0.3
- PATCH_NORM: True
- PRETRAIN_IMG_SIZE: 384
- WEIGHTS: "swin_large_patch4_window12_384_22k.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
- MASK_FORMER:
- NUM_OBJECT_QUERIES: 200
-SOLVER:
- STEPS: (655556, 710184)
- MAX_ITER: 737500
diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_small_bs16_50ep.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_small_bs16_50ep.yaml
deleted file mode 100644
index f9b1c56..0000000
--- a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_small_bs16_50ep.yaml
+++ /dev/null
@@ -1,15 +0,0 @@
-_BASE_: ../maskformer2_R50_bs16_50ep.yaml
-MODEL:
- BACKBONE:
- NAME: "D2SwinTransformer"
- SWIN:
- EMBED_DIM: 96
- DEPTHS: [2, 2, 18, 2]
- NUM_HEADS: [3, 6, 12, 24]
- WINDOW_SIZE: 7
- APE: False
- DROP_PATH_RATE: 0.3
- PATCH_NORM: True
- WEIGHTS: "swin_small_patch4_window7_224.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_tiny_bs16_50ep.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_tiny_bs16_50ep.yaml
deleted file mode 100644
index 7f27bc5..0000000
--- a/SeMask-FAPN/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_tiny_bs16_50ep.yaml
+++ /dev/null
@@ -1,15 +0,0 @@
-_BASE_: ../maskformer2_R50_bs16_50ep.yaml
-MODEL:
- BACKBONE:
- NAME: "D2SwinTransformer"
- SWIN:
- EMBED_DIM: 96
- DEPTHS: [2, 2, 6, 2]
- NUM_HEADS: [3, 6, 12, 24]
- WINDOW_SIZE: 7
- APE: False
- DROP_PATH_RATE: 0.3
- PATCH_NORM: True
- WEIGHTS: "swin_tiny_patch4_window7_224.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/mapillary-vistas/panoptic-segmentation/Base-MapillaryVistas-PanopticSegmentation.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/mapillary-vistas/panoptic-segmentation/Base-MapillaryVistas-PanopticSegmentation.yaml
deleted file mode 100644
index 86629a3..0000000
--- a/SeMask-FAPN/SeMask-Mask2Former/configs/mapillary-vistas/panoptic-segmentation/Base-MapillaryVistas-PanopticSegmentation.yaml
+++ /dev/null
@@ -1,56 +0,0 @@
-MODEL:
- BACKBONE:
- FREEZE_AT: 0
- NAME: "build_resnet_backbone"
- WEIGHTS: "detectron2://ImageNetPretrained/torchvision/R-50.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
- RESNETS:
- DEPTH: 50
- STEM_TYPE: "basic" # not used
- STEM_OUT_CHANNELS: 64
- STRIDE_IN_1X1: False
- OUT_FEATURES: ["res2", "res3", "res4", "res5"]
- # NORM: "SyncBN"
- RES5_MULTI_GRID: [1, 1, 1] # not used
-DATASETS:
- TRAIN: ("mapillary_vistas_panoptic_train",)
- TEST: ("mapillary_vistas_panoptic_val",)
-SOLVER:
- IMS_PER_BATCH: 16
- BASE_LR: 0.0001
- MAX_ITER: 300000
- WARMUP_FACTOR: 1.0
- WARMUP_ITERS: 0
- WEIGHT_DECAY: 0.05
- OPTIMIZER: "ADAMW"
- LR_SCHEDULER_NAME: "WarmupPolyLR"
- BACKBONE_MULTIPLIER: 0.1
- CLIP_GRADIENTS:
- ENABLED: True
- CLIP_TYPE: "full_model"
- CLIP_VALUE: 0.01
- NORM_TYPE: 2.0
- AMP:
- ENABLED: True
-INPUT:
- MIN_SIZE_TRAIN: !!python/object/apply:eval ["[int(x * 0.1 * 2048) for x in range(5, 21)]"]
- MIN_SIZE_TRAIN_SAMPLING: "choice"
- MIN_SIZE_TEST: 2048
- MAX_SIZE_TRAIN: 8192
- MAX_SIZE_TEST: 2048
- CROP:
- ENABLED: True
- TYPE: "absolute"
- SIZE: (1024, 1024)
- SINGLE_CATEGORY_MAX_AREA: 1.0
- COLOR_AUG_SSD: True
- SIZE_DIVISIBILITY: 1024 # used in dataset mapper
- FORMAT: "RGB"
- DATASET_MAPPER_NAME: "mask_former_panoptic"
-TEST:
- EVAL_PERIOD: 0
-DATALOADER:
- FILTER_EMPTY_ANNOTATIONS: True
- NUM_WORKERS: 10
-VERSION: 2
diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/mapillary-vistas/panoptic-segmentation/maskformer_R50_bs16_300k.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/mapillary-vistas/panoptic-segmentation/maskformer_R50_bs16_300k.yaml
deleted file mode 100644
index d6a0eaa..0000000
--- a/SeMask-FAPN/SeMask-Mask2Former/configs/mapillary-vistas/panoptic-segmentation/maskformer_R50_bs16_300k.yaml
+++ /dev/null
@@ -1,44 +0,0 @@
-_BASE_: Base-MapillaryVistas-SemanticSegmentation.yaml
-MODEL:
- META_ARCHITECTURE: "MaskFormer"
- SEM_SEG_HEAD:
- NAME: "MaskFormerHead"
- IGNORE_VALUE: 65
- NUM_CLASSES: 65
- LOSS_WEIGHT: 1.0
- CONVS_DIM: 256
- MASK_DIM: 256
- NORM: "GN"
- # pixel decoder
- PIXEL_DECODER_NAME: "MSDeformAttnPixelDecoder"
- IN_FEATURES: ["res2", "res3", "res4", "res5"]
- DEFORMABLE_TRANSFORMER_ENCODER_IN_FEATURES: ["res3", "res4", "res5"]
- COMMON_STRIDE: 4
- TRANSFORMER_ENC_LAYERS: 6
- MASK_FORMER:
- TRANSFORMER_DECODER_NAME: "MultiScaleMaskedTransformerDecoder"
- TRANSFORMER_IN_FEATURE: "multi_scale_pixel_decoder"
- DEEP_SUPERVISION: True
- NO_OBJECT_WEIGHT: 0.1
- CLASS_WEIGHT: 2.0
- MASK_WEIGHT: 5.0
- DICE_WEIGHT: 5.0
- HIDDEN_DIM: 256
- NUM_OBJECT_QUERIES: 100
- NHEADS: 8
- DROPOUT: 0.0
- DIM_FEEDFORWARD: 2048
- ENC_LAYERS: 0
- PRE_NORM: False
- ENFORCE_INPUT_PROJ: False
- SIZE_DIVISIBILITY: 32
- DEC_LAYERS: 10 # 9 decoder layers, add one for the loss on learnable query
- TRAIN_NUM_POINTS: 12544
- OVERSAMPLE_RATIO: 3.0
- IMPORTANCE_SAMPLE_RATIO: 0.75
- TEST:
- SEMANTIC_ON: True
- INSTANCE_ON: False
- PANOPTIC_ON: True
- OVERLAP_THRESHOLD: 0.8
- OBJECT_MASK_THRESHOLD: 0.0
diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/mapillary-vistas/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_300k.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/mapillary-vistas/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_300k.yaml
deleted file mode 100644
index e7a8c4c..0000000
--- a/SeMask-FAPN/SeMask-Mask2Former/configs/mapillary-vistas/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_300k.yaml
+++ /dev/null
@@ -1,18 +0,0 @@
-_BASE_: ../maskformer_R50_bs16_300k.yaml
-MODEL:
- BACKBONE:
- NAME: "D2SwinTransformer"
- SWIN:
- EMBED_DIM: 192
- DEPTHS: [2, 2, 18, 2]
- NUM_HEADS: [6, 12, 24, 48]
- WINDOW_SIZE: 12
- APE: False
- DROP_PATH_RATE: 0.3
- PATCH_NORM: True
- PRETRAIN_IMG_SIZE: 384
- WEIGHTS: "swin_large_patch4_window12_384_22k.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
- MASK_FORMER:
- NUM_OBJECT_QUERIES: 200
diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/mapillary-vistas/semantic-segmentation/Base-MapillaryVistas-SemanticSegmentation.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/mapillary-vistas/semantic-segmentation/Base-MapillaryVistas-SemanticSegmentation.yaml
deleted file mode 100644
index f05fb28..0000000
--- a/SeMask-FAPN/SeMask-Mask2Former/configs/mapillary-vistas/semantic-segmentation/Base-MapillaryVistas-SemanticSegmentation.yaml
+++ /dev/null
@@ -1,56 +0,0 @@
-MODEL:
- BACKBONE:
- FREEZE_AT: 0
- NAME: "build_resnet_backbone"
- WEIGHTS: "detectron2://ImageNetPretrained/torchvision/R-50.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
- RESNETS:
- DEPTH: 50
- STEM_TYPE: "basic" # not used
- STEM_OUT_CHANNELS: 64
- STRIDE_IN_1X1: False
- OUT_FEATURES: ["res2", "res3", "res4", "res5"]
- # NORM: "SyncBN"
- RES5_MULTI_GRID: [1, 1, 1] # not used
-DATASETS:
- TRAIN: ("mapillary_vistas_sem_seg_train",)
- TEST: ("mapillary_vistas_sem_seg_val",)
-SOLVER:
- IMS_PER_BATCH: 16
- BASE_LR: 0.0001
- MAX_ITER: 300000
- WARMUP_FACTOR: 1.0
- WARMUP_ITERS: 0
- WEIGHT_DECAY: 0.05
- OPTIMIZER: "ADAMW"
- LR_SCHEDULER_NAME: "WarmupPolyLR"
- BACKBONE_MULTIPLIER: 0.1
- CLIP_GRADIENTS:
- ENABLED: True
- CLIP_TYPE: "full_model"
- CLIP_VALUE: 0.01
- NORM_TYPE: 2.0
- AMP:
- ENABLED: True
-INPUT:
- MIN_SIZE_TRAIN: !!python/object/apply:eval ["[int(x * 0.1 * 2048) for x in range(5, 21)]"]
- MIN_SIZE_TRAIN_SAMPLING: "choice"
- MIN_SIZE_TEST: 2048
- MAX_SIZE_TRAIN: 8192
- MAX_SIZE_TEST: 2048
- CROP:
- ENABLED: True
- TYPE: "absolute"
- SIZE: (1024, 1024)
- SINGLE_CATEGORY_MAX_AREA: 1.0
- COLOR_AUG_SSD: True
- SIZE_DIVISIBILITY: 1024 # used in dataset mapper
- FORMAT: "RGB"
- DATASET_MAPPER_NAME: "mask_former_semantic"
-TEST:
- EVAL_PERIOD: 0
-DATALOADER:
- FILTER_EMPTY_ANNOTATIONS: True
- NUM_WORKERS: 10
-VERSION: 2
diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/mapillary-vistas/semantic-segmentation/maskformer2_R50_bs16_300k.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/mapillary-vistas/semantic-segmentation/maskformer2_R50_bs16_300k.yaml
deleted file mode 100644
index e9977a1..0000000
--- a/SeMask-FAPN/SeMask-Mask2Former/configs/mapillary-vistas/semantic-segmentation/maskformer2_R50_bs16_300k.yaml
+++ /dev/null
@@ -1,44 +0,0 @@
-_BASE_: Base-MapillaryVistas-SemanticSegmentation.yaml
-MODEL:
- META_ARCHITECTURE: "MaskFormer"
- SEM_SEG_HEAD:
- NAME: "MaskFormerHead"
- IGNORE_VALUE: 65
- NUM_CLASSES: 65
- LOSS_WEIGHT: 1.0
- CONVS_DIM: 256
- MASK_DIM: 256
- NORM: "GN"
- # pixel decoder
- PIXEL_DECODER_NAME: "MSDeformAttnPixelDecoder"
- IN_FEATURES: ["res2", "res3", "res4", "res5"]
- DEFORMABLE_TRANSFORMER_ENCODER_IN_FEATURES: ["res3", "res4", "res5"]
- COMMON_STRIDE: 4
- TRANSFORMER_ENC_LAYERS: 6
- MASK_FORMER:
- TRANSFORMER_DECODER_NAME: "MultiScaleMaskedTransformerDecoder"
- TRANSFORMER_IN_FEATURE: "multi_scale_pixel_decoder"
- DEEP_SUPERVISION: True
- NO_OBJECT_WEIGHT: 0.1
- CLASS_WEIGHT: 2.0
- MASK_WEIGHT: 5.0
- DICE_WEIGHT: 5.0
- HIDDEN_DIM: 256
- NUM_OBJECT_QUERIES: 100
- NHEADS: 8
- DROPOUT: 0.0
- DIM_FEEDFORWARD: 2048
- ENC_LAYERS: 0
- PRE_NORM: False
- ENFORCE_INPUT_PROJ: False
- SIZE_DIVISIBILITY: 32
- DEC_LAYERS: 10 # 9 decoder layers, add one for the loss on learnable query
- TRAIN_NUM_POINTS: 12544
- OVERSAMPLE_RATIO: 3.0
- IMPORTANCE_SAMPLE_RATIO: 0.75
- TEST:
- SEMANTIC_ON: True
- INSTANCE_ON: False
- PANOPTIC_ON: False
- OVERLAP_THRESHOLD: 0.8
- OBJECT_MASK_THRESHOLD: 0.0
diff --git a/SeMask-FAPN/SeMask-Mask2Former/configs/mapillary-vistas/semantic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_300k.yaml b/SeMask-FAPN/SeMask-Mask2Former/configs/mapillary-vistas/semantic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_300k.yaml
deleted file mode 100644
index e336a1b..0000000
--- a/SeMask-FAPN/SeMask-Mask2Former/configs/mapillary-vistas/semantic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_300k.yaml
+++ /dev/null
@@ -1,18 +0,0 @@
-_BASE_: ../maskformer2_R50_bs16_300k.yaml
-MODEL:
- BACKBONE:
- NAME: "D2SwinTransformer"
- SWIN:
- EMBED_DIM: 192
- DEPTHS: [2, 2, 18, 2]
- NUM_HEADS: [6, 12, 24, 48]
- WINDOW_SIZE: 12
- APE: False
- DROP_PATH_RATE: 0.3
- PATCH_NORM: True
- PRETRAIN_IMG_SIZE: 384
- WEIGHTS: "swin_large_patch4_window12_384_22k.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
- MASK_FORMER:
- NUM_OBJECT_QUERIES: 100
diff --git a/SeMask-FPN/README.md b/SeMask-FPN/README.md
index ef590e4..a7ee060 100644
--- a/SeMask-FPN/README.md
+++ b/SeMask-FPN/README.md
@@ -19,11 +19,9 @@ This repo contains the code for our paper **SeMask: Semantically Masked Transfor
### ADE20K
-
-
| Method | Backbone | Crop Size | mIoU | mIoU (ms+flip) | #params | config | Checkpoint |
| :---:| :---: | :---: | :---:| :---: | :---: | :---: | :---: |
-| SeMask-T FPN | SeMask Swin-T | 512x512 | 42.11 | 43.16 | 35M | [config](configs/semask_swin/ade20k/semfpn_semask_swin_tiny_patch4_window7_512x512_80k_ade20k.py) | TBD |
+| SeMask-T FPN | SeMask Swin-T | 512x512 | 42.06 | 43.36 | 35M | [config](configs/semask_swin/ade20k/semfpn_semask_swin_tiny_patch4_window7_512x512_80k_ade20k.py) | [checkpoint](https://drive.google.com/file/d/1L0daUHWQGNGCXHF-cKWEauPSyBV0GLOR/view?usp=sharing) |
| SeMask-S FPN | SeMask Swin-S | 512x512 | 45.92 | 47.63 | 56M | [config](configs/semask_swin/ade20k/semfpn_semask_swin_small_patch4_window7_512x512_80k_ade20k.py) | [checkpoint](https://drive.google.com/file/d/1QhDG4SyGFtWL5kP9BbBoyPqTuFu7fH_y/view?usp=sharing) |
| SeMask-B FPN | SeMask Swin-B† | 512x512 | 49.35 | 50.98 | 96M | [config](configs/semask_swin/ade20k/semfpn_semask_swin_base_patch4_window12_512x512_80k_ade20k.py) | [checkpoint](https://drive.google.com/file/d/1PXCEhrrUy5TJC4dUp7YDQvaapnMzGT6C/view?usp=sharing) |
| SeMask-L FPN | SeMask Swin-L† | 640x640 | 51.89 | 53.52 | 211M| [config](configs/semask_swin/ade20k/semfpn_semask_swin_large_patch4_window12_640x640_80k_ade20k.py) | [checkpoint](https://drive.google.com/file/d/1u5flfAQCiQJbMZbZPIlGUGTYBz9Ca7rE/view?usp=sharing) |
diff --git a/SeMask-FPN/demo/demo.py b/SeMask-FPN/demo/demo.py
index 8414302..e4fb23b 100644
--- a/SeMask-FPN/demo/demo.py
+++ b/SeMask-FPN/demo/demo.py
@@ -7,9 +7,9 @@
def main():
parser = ArgumentParser()
- parser.add_argument('img', help='Image file')
- parser.add_argument('config', help='Config file')
- parser.add_argument('checkpoint', help='Checkpoint file')
+ parser.add_argument('--img', help='Image file')
+ parser.add_argument('--config', help='Config file')
+ parser.add_argument('--checkpoint', help='Checkpoint file')
parser.add_argument(
'--device', default='cuda:0', help='Device used for inference')
parser.add_argument(
diff --git a/SeMask-FPN/mmseg/apis/inference.py b/SeMask-FPN/mmseg/apis/inference.py
index 20c20dc..075a919 100644
--- a/SeMask-FPN/mmseg/apis/inference.py
+++ b/SeMask-FPN/mmseg/apis/inference.py
@@ -10,7 +10,6 @@
def init_segmentor(config, checkpoint=None, device='cuda:0'):
"""Initialize a segmentor from config file.
-
Args:
config (str or :obj:`mmcv.Config`): Config file path or the config
object.
@@ -44,11 +43,9 @@ class LoadImage:
def __call__(self, results):
"""Call function to load images into results.
-
Args:
results (dict): A result dict contains the file name
of the image to be read.
-
Returns:
dict: ``results`` will be returned containing loaded image.
"""
@@ -68,12 +65,10 @@ def __call__(self, results):
def inference_segmentor(model, img):
"""Inference image(s) with the segmentor.
-
Args:
model (nn.Module): The loaded segmentor.
imgs (str/ndarray or list[str/ndarray]): Either image files or loaded
images.
-
Returns:
(list[Tensor]): The segmentation result.
"""
@@ -98,9 +93,15 @@ def inference_segmentor(model, img):
return result
-def show_result_pyplot(model, img, result, palette=None, fig_size=(15, 10)):
+def show_result_pyplot(model,
+ img,
+ result,
+ palette=None,
+ fig_size=(15, 10),
+ opacity=0.5,
+ title='',
+ block=True):
"""Visualize the segmentation results on the image.
-
Args:
model (nn.Module): The loaded segmentor.
img (str or np.ndarray): Image filename or loaded image.
@@ -109,10 +110,20 @@ def show_result_pyplot(model, img, result, palette=None, fig_size=(15, 10)):
map. If None is given, random palette will be generated.
Default: None
fig_size (tuple): Figure size of the pyplot figure.
+ opacity(float): Opacity of painted segmentation map.
+ Default 0.5.
+ Must be in (0, 1] range.
+ title (str): The title of pyplot figure.
+ Default is ''.
+ block (bool): Whether to block the pyplot figure.
+ Default is True.
"""
if hasattr(model, 'module'):
model = model.module
- img = model.show_result(img, result, palette=palette, show=False)
+ img = model.show_result(
+ img, result, palette=palette, show=False, opacity=opacity)
plt.figure(figsize=fig_size)
plt.imshow(mmcv.bgr2rgb(img))
- plt.show()
+ plt.title(title)
+ plt.tight_layout()
+ plt.show(block=block)
diff --git a/SeMask-FPN/mmseg/models/segmentors/base.py b/SeMask-FPN/mmseg/models/segmentors/base.py
index 1c69406..b80d956 100644
--- a/SeMask-FPN/mmseg/models/segmentors/base.py
+++ b/SeMask-FPN/mmseg/models/segmentors/base.py
@@ -226,6 +226,81 @@ def _parse_losses(losses):
return loss, log_vars
+ def show_inference_result(self,
+ img,
+ result,
+ palette=None,
+ win_name='',
+ show=False,
+ wait_time=0,
+ out_file=None,
+ opacity=0.5):
+ """Draw `result` over `img`.
+ Args:
+ img (str or Tensor): The image to be displayed.
+ result (Tensor): The semantic segmentation results to draw over
+ `img`.
+ palette (list[list[int]]] | np.ndarray | None): The palette of
+ segmentation map. If None is given, random palette will be
+ generated. Default: None
+ win_name (str): The window name.
+ wait_time (int): Value of waitKey param.
+ Default: 0.
+ show (bool): Whether to show the image.
+ Default: False.
+ out_file (str or None): The filename to write the image.
+ Default: None.
+ opacity(float): Opacity of painted segmentation map.
+ Default 0.5.
+ Must be in (0, 1] range.
+ Returns:
+ img (Tensor): Only if not `show` or `out_file`
+ """
+ img = mmcv.imread(img)
+ img = img.copy()
+ seg = result[0]
+ if palette is None:
+ if self.PALETTE is None:
+ # Get random state before set seed,
+ # and restore random state later.
+ # It will prevent loss of randomness, as the palette
+ # may be different in each iteration if not specified.
+ # See: https://github.com/open-mmlab/mmdetection/issues/5844
+ state = np.random.get_state()
+ np.random.seed(42)
+ # random palette
+ palette = np.random.randint(
+ 0, 255, size=(len(self.CLASSES), 3))
+ np.random.set_state(state)
+ else:
+ palette = self.PALETTE
+ palette = np.array(palette)
+ assert palette.shape[0] == len(self.CLASSES)
+ assert palette.shape[1] == 3
+ assert len(palette.shape) == 2
+ assert 0 < opacity <= 1.0
+ color_seg = np.zeros((seg.shape[0], seg.shape[1], 3), dtype=np.uint8)
+ for label, color in enumerate(palette):
+ color_seg[seg == label, :] = color
+ # convert to BGR
+ color_seg = color_seg[..., ::-1]
+
+ img = img * (1 - opacity) + color_seg * opacity
+ img = img.astype(np.uint8)
+ # if out_file specified, do not show image in window
+ if out_file is not None:
+ show = False
+
+ if show:
+ mmcv.imshow(img, win_name, wait_time)
+ if out_file is not None:
+ mmcv.imwrite(img, out_file)
+
+ if not (show or out_file):
+ warnings.warn('show==False and out_file is not specified, only '
+ 'result image will be returned')
+ return img
+
def show_result(self,
i,
img,
diff --git a/SeMask-Mask2Former/GETTING_STARTED.md b/SeMask-Mask2Former/GETTING_STARTED.md
index 6bb096c..c2eb6f7 100644
--- a/SeMask-Mask2Former/GETTING_STARTED.md
+++ b/SeMask-Mask2Former/GETTING_STARTED.md
@@ -9,11 +9,11 @@ Please see [Getting Started with Detectron2](https://github.com/facebookresearch
1. Pick a model and its config file from
[model zoo](MODEL_ZOO.md),
- for example, `configs/coco/panoptic-segmentation/maskformer2_R50_bs16_50ep.yaml`.
+ for example, `configs/ade20k/semantic-segmentation/semask_swin/maskformer2_semask_swin_large_IN21k_384_bs16_160k_res640.yaml`.
2. We provide `demo.py` that is able to demo builtin configs. Run it with:
```
cd demo/
-python demo.py --config-file ../configs/coco/panoptic-segmentation/maskformer2_R50_bs16_50ep.yaml \
+python demo.py --config-file ../configs/ade20k/semantic-segmentation/semask_swin/maskformer2_semask_swin_large_IN21k_384_bs16_160k_res640.yaml \
--input input1.jpg input2.jpg \
[--other-options]
--opts MODEL.WEIGHTS /path/to/checkpoint_file
@@ -39,7 +39,7 @@ setup the corresponding datasets following
then run:
```
python train_net.py --num-gpus 8 \
- --config-file configs/coco/panoptic-segmentation/maskformer2_R50_bs16_50ep.yaml
+ --config-file configs/ade20k/semantic-segmentation/semask_swin/maskformer2_semask_swin_large_IN21k_384_bs16_160k_res640.yaml
```
The configs are made for 8-GPU training.
@@ -47,14 +47,14 @@ Since we use ADAMW optimizer, it is not clear how to scale learning rate with ba
To train on 1 GPU, you need to figure out learning rate and batch size by yourself:
```
python train_net.py \
- --config-file configs/coco/panoptic-segmentation/maskformer2_R50_bs16_50ep.yaml \
+ --config-file configs/ade20k/semantic-segmentation/semask_swin/maskformer2_semask_swin_large_IN21k_384_bs16_160k_res640.yaml \
--num-gpus 1 SOLVER.IMS_PER_BATCH SET_TO_SOME_REASONABLE_VALUE SOLVER.BASE_LR SET_TO_SOME_REASONABLE_VALUE
```
To evaluate a model's performance, use
```
python train_net.py \
- --config-file configs/coco/panoptic-segmentation/maskformer2_R50_bs16_50ep.yaml \
+ --config-file configs/ade20k/semantic-segmentation/semask_swin/maskformer2_semask_swin_large_IN21k_384_bs16_160k_res640.yaml \
--eval-only MODEL.WEIGHTS /path/to/checkpoint_file
```
For more options, see `python train_net.py -h`.
diff --git a/SeMask-Mask2Former/configs/ade20k/instance-segmentation/Base-ADE20K-InstanceSegmentation.yaml b/SeMask-Mask2Former/configs/ade20k/instance-segmentation/Base-ADE20K-InstanceSegmentation.yaml
deleted file mode 100644
index 50a1c13..0000000
--- a/SeMask-Mask2Former/configs/ade20k/instance-segmentation/Base-ADE20K-InstanceSegmentation.yaml
+++ /dev/null
@@ -1,61 +0,0 @@
-MODEL:
- BACKBONE:
- FREEZE_AT: 0
- NAME: "build_resnet_backbone"
- WEIGHTS: "detectron2://ImageNetPretrained/torchvision/R-50.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
- RESNETS:
- DEPTH: 50
- STEM_TYPE: "basic" # not used
- STEM_OUT_CHANNELS: 64
- STRIDE_IN_1X1: False
- OUT_FEATURES: ["res2", "res3", "res4", "res5"]
- # NORM: "SyncBN"
- RES5_MULTI_GRID: [1, 1, 1] # not used
-DATASETS:
- TRAIN: ("ade20k_instance_train",)
- TEST: ("ade20k_instance_val",)
-SOLVER:
- IMS_PER_BATCH: 16
- BASE_LR: 0.0001
- MAX_ITER: 160000
- WARMUP_FACTOR: 1.0
- WARMUP_ITERS: 0
- WEIGHT_DECAY: 0.05
- OPTIMIZER: "ADAMW"
- LR_SCHEDULER_NAME: "WarmupPolyLR"
- BACKBONE_MULTIPLIER: 0.1
- CLIP_GRADIENTS:
- ENABLED: True
- CLIP_TYPE: "full_model"
- CLIP_VALUE: 0.01
- NORM_TYPE: 2.0
- AMP:
- ENABLED: True
-INPUT:
- MIN_SIZE_TRAIN: !!python/object/apply:eval ["[int(x * 0.1 * 640) for x in range(5, 21)]"]
- MIN_SIZE_TRAIN_SAMPLING: "choice"
- MIN_SIZE_TEST: 640
- MAX_SIZE_TRAIN: 2560
- MAX_SIZE_TEST: 2560
- CROP:
- ENABLED: True
- TYPE: "absolute"
- SIZE: (640, 640)
- SINGLE_CATEGORY_MAX_AREA: 1.0
- COLOR_AUG_SSD: True
- SIZE_DIVISIBILITY: 640 # used in dataset mapper
- FORMAT: "RGB"
- DATASET_MAPPER_NAME: "mask_former_instance"
-TEST:
- EVAL_PERIOD: 5000
- AUG:
- ENABLED: False
- MIN_SIZES: [320, 480, 640, 800, 960, 1120]
- MAX_SIZE: 4480
- FLIP: True
-DATALOADER:
- FILTER_EMPTY_ANNOTATIONS: True
- NUM_WORKERS: 4
-VERSION: 2
diff --git a/SeMask-Mask2Former/configs/ade20k/instance-segmentation/maskformer2_R50_bs16_160k.yaml b/SeMask-Mask2Former/configs/ade20k/instance-segmentation/maskformer2_R50_bs16_160k.yaml
deleted file mode 100644
index e37bcfb..0000000
--- a/SeMask-Mask2Former/configs/ade20k/instance-segmentation/maskformer2_R50_bs16_160k.yaml
+++ /dev/null
@@ -1,44 +0,0 @@
-_BASE_: Base-ADE20K-InstanceSegmentation.yaml
-MODEL:
- META_ARCHITECTURE: "MaskFormer"
- SEM_SEG_HEAD:
- NAME: "MaskFormerHead"
- IGNORE_VALUE: 255
- NUM_CLASSES: 100
- LOSS_WEIGHT: 1.0
- CONVS_DIM: 256
- MASK_DIM: 256
- NORM: "GN"
- # pixel decoder
- PIXEL_DECODER_NAME: "MSDeformAttnPixelDecoder"
- IN_FEATURES: ["res2", "res3", "res4", "res5"]
- DEFORMABLE_TRANSFORMER_ENCODER_IN_FEATURES: ["res3", "res4", "res5"]
- COMMON_STRIDE: 4
- TRANSFORMER_ENC_LAYERS: 6
- MASK_FORMER:
- TRANSFORMER_DECODER_NAME: "MultiScaleMaskedTransformerDecoder"
- TRANSFORMER_IN_FEATURE: "multi_scale_pixel_decoder"
- DEEP_SUPERVISION: True
- NO_OBJECT_WEIGHT: 0.1
- CLASS_WEIGHT: 2.0
- MASK_WEIGHT: 5.0
- DICE_WEIGHT: 5.0
- HIDDEN_DIM: 256
- NUM_OBJECT_QUERIES: 100
- NHEADS: 8
- DROPOUT: 0.0
- DIM_FEEDFORWARD: 2048
- ENC_LAYERS: 0
- PRE_NORM: False
- ENFORCE_INPUT_PROJ: False
- SIZE_DIVISIBILITY: 32
- DEC_LAYERS: 10 # 9 decoder layers, add one for the loss on learnable query
- TRAIN_NUM_POINTS: 12544
- OVERSAMPLE_RATIO: 3.0
- IMPORTANCE_SAMPLE_RATIO: 0.75
- TEST:
- SEMANTIC_ON: True
- INSTANCE_ON: True
- PANOPTIC_ON: True
- OVERLAP_THRESHOLD: 0.8
- OBJECT_MASK_THRESHOLD: 0.8
diff --git a/SeMask-Mask2Former/configs/ade20k/instance-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_160k.yaml b/SeMask-Mask2Former/configs/ade20k/instance-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_160k.yaml
deleted file mode 100644
index af03d4d..0000000
--- a/SeMask-Mask2Former/configs/ade20k/instance-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_160k.yaml
+++ /dev/null
@@ -1,18 +0,0 @@
-_BASE_: ../maskformer2_R50_bs16_160k.yaml
-MODEL:
- BACKBONE:
- NAME: "D2SwinTransformer"
- SWIN:
- EMBED_DIM: 192
- DEPTHS: [2, 2, 18, 2]
- NUM_HEADS: [6, 12, 24, 48]
- WINDOW_SIZE: 12
- APE: False
- DROP_PATH_RATE: 0.3
- PATCH_NORM: True
- PRETRAIN_IMG_SIZE: 384
- WEIGHTS: "swin_large_patch4_window12_384_22k.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
- MASK_FORMER:
- NUM_OBJECT_QUERIES: 200
diff --git a/SeMask-Mask2Former/configs/ade20k/panoptic-segmentation/Base-ADE20K-PanopticSegmentation.yaml b/SeMask-Mask2Former/configs/ade20k/panoptic-segmentation/Base-ADE20K-PanopticSegmentation.yaml
deleted file mode 100644
index 559be07..0000000
--- a/SeMask-Mask2Former/configs/ade20k/panoptic-segmentation/Base-ADE20K-PanopticSegmentation.yaml
+++ /dev/null
@@ -1,61 +0,0 @@
-MODEL:
- BACKBONE:
- FREEZE_AT: 0
- NAME: "build_resnet_backbone"
- WEIGHTS: "detectron2://ImageNetPretrained/torchvision/R-50.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
- RESNETS:
- DEPTH: 50
- STEM_TYPE: "basic" # not used
- STEM_OUT_CHANNELS: 64
- STRIDE_IN_1X1: False
- OUT_FEATURES: ["res2", "res3", "res4", "res5"]
- # NORM: "SyncBN"
- RES5_MULTI_GRID: [1, 1, 1] # not used
-DATASETS:
- TRAIN: ("ade20k_panoptic_train",)
- TEST: ("ade20k_panoptic_val",)
-SOLVER:
- IMS_PER_BATCH: 16
- BASE_LR: 0.0001
- MAX_ITER: 160000
- WARMUP_FACTOR: 1.0
- WARMUP_ITERS: 0
- WEIGHT_DECAY: 0.05
- OPTIMIZER: "ADAMW"
- LR_SCHEDULER_NAME: "WarmupPolyLR"
- BACKBONE_MULTIPLIER: 0.1
- CLIP_GRADIENTS:
- ENABLED: True
- CLIP_TYPE: "full_model"
- CLIP_VALUE: 0.01
- NORM_TYPE: 2.0
- AMP:
- ENABLED: True
-INPUT:
- MIN_SIZE_TRAIN: !!python/object/apply:eval ["[int(x * 0.1 * 640) for x in range(5, 21)]"]
- MIN_SIZE_TRAIN_SAMPLING: "choice"
- MIN_SIZE_TEST: 640
- MAX_SIZE_TRAIN: 2560
- MAX_SIZE_TEST: 2560
- CROP:
- ENABLED: True
- TYPE: "absolute"
- SIZE: (640, 640)
- SINGLE_CATEGORY_MAX_AREA: 1.0
- COLOR_AUG_SSD: True
- SIZE_DIVISIBILITY: 640 # used in dataset mapper
- FORMAT: "RGB"
- DATASET_MAPPER_NAME: "mask_former_panoptic"
-TEST:
- EVAL_PERIOD: 5000
- AUG:
- ENABLED: False
- MIN_SIZES: [320, 480, 640, 800, 960, 1120]
- MAX_SIZE: 4480
- FLIP: True
-DATALOADER:
- FILTER_EMPTY_ANNOTATIONS: True
- NUM_WORKERS: 4
-VERSION: 2
diff --git a/SeMask-Mask2Former/configs/ade20k/panoptic-segmentation/maskformer2_R50_bs16_160k.yaml b/SeMask-Mask2Former/configs/ade20k/panoptic-segmentation/maskformer2_R50_bs16_160k.yaml
deleted file mode 100644
index 82c0828..0000000
--- a/SeMask-Mask2Former/configs/ade20k/panoptic-segmentation/maskformer2_R50_bs16_160k.yaml
+++ /dev/null
@@ -1,44 +0,0 @@
-_BASE_: Base-ADE20K-PanopticSegmentation.yaml
-MODEL:
- META_ARCHITECTURE: "MaskFormer"
- SEM_SEG_HEAD:
- NAME: "MaskFormerHead"
- IGNORE_VALUE: 255
- NUM_CLASSES: 150
- LOSS_WEIGHT: 1.0
- CONVS_DIM: 256
- MASK_DIM: 256
- NORM: "GN"
- # pixel decoder
- PIXEL_DECODER_NAME: "MSDeformAttnPixelDecoder"
- IN_FEATURES: ["res2", "res3", "res4", "res5"]
- DEFORMABLE_TRANSFORMER_ENCODER_IN_FEATURES: ["res3", "res4", "res5"]
- COMMON_STRIDE: 4
- TRANSFORMER_ENC_LAYERS: 6
- MASK_FORMER:
- TRANSFORMER_DECODER_NAME: "MultiScaleMaskedTransformerDecoder"
- TRANSFORMER_IN_FEATURE: "multi_scale_pixel_decoder"
- DEEP_SUPERVISION: True
- NO_OBJECT_WEIGHT: 0.1
- CLASS_WEIGHT: 2.0
- MASK_WEIGHT: 5.0
- DICE_WEIGHT: 5.0
- HIDDEN_DIM: 256
- NUM_OBJECT_QUERIES: 100
- NHEADS: 8
- DROPOUT: 0.0
- DIM_FEEDFORWARD: 2048
- ENC_LAYERS: 0
- PRE_NORM: False
- ENFORCE_INPUT_PROJ: False
- SIZE_DIVISIBILITY: 32
- DEC_LAYERS: 10 # 9 decoder layers, add one for the loss on learnable query
- TRAIN_NUM_POINTS: 12544
- OVERSAMPLE_RATIO: 3.0
- IMPORTANCE_SAMPLE_RATIO: 0.75
- TEST:
- SEMANTIC_ON: True
- INSTANCE_ON: True
- PANOPTIC_ON: True
- OVERLAP_THRESHOLD: 0.8
- OBJECT_MASK_THRESHOLD: 0.8
diff --git a/SeMask-Mask2Former/configs/ade20k/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_160k.yaml b/SeMask-Mask2Former/configs/ade20k/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_160k.yaml
deleted file mode 100644
index af03d4d..0000000
--- a/SeMask-Mask2Former/configs/ade20k/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_160k.yaml
+++ /dev/null
@@ -1,18 +0,0 @@
-_BASE_: ../maskformer2_R50_bs16_160k.yaml
-MODEL:
- BACKBONE:
- NAME: "D2SwinTransformer"
- SWIN:
- EMBED_DIM: 192
- DEPTHS: [2, 2, 18, 2]
- NUM_HEADS: [6, 12, 24, 48]
- WINDOW_SIZE: 12
- APE: False
- DROP_PATH_RATE: 0.3
- PATCH_NORM: True
- PRETRAIN_IMG_SIZE: 384
- WEIGHTS: "swin_large_patch4_window12_384_22k.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
- MASK_FORMER:
- NUM_OBJECT_QUERIES: 200
diff --git a/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/Base-Cityscapes-InstanceSegmentation.yaml b/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/Base-Cityscapes-InstanceSegmentation.yaml
deleted file mode 100644
index 28833e7..0000000
--- a/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/Base-Cityscapes-InstanceSegmentation.yaml
+++ /dev/null
@@ -1,61 +0,0 @@
-MODEL:
- BACKBONE:
- FREEZE_AT: 0
- NAME: "build_resnet_backbone"
- WEIGHTS: "detectron2://ImageNetPretrained/torchvision/R-50.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
- RESNETS:
- DEPTH: 50
- STEM_TYPE: "basic" # not used
- STEM_OUT_CHANNELS: 64
- STRIDE_IN_1X1: False
- OUT_FEATURES: ["res2", "res3", "res4", "res5"]
- NORM: "SyncBN" # use syncbn for cityscapes dataset
- RES5_MULTI_GRID: [1, 1, 1] # not used
-DATASETS:
- TRAIN: ("cityscapes_fine_instance_seg_train",)
- TEST: ("cityscapes_fine_instance_seg_val",)
-SOLVER:
- IMS_PER_BATCH: 16
- BASE_LR: 0.0001
- MAX_ITER: 90000
- WARMUP_FACTOR: 1.0
- WARMUP_ITERS: 0
- WEIGHT_DECAY: 0.05
- OPTIMIZER: "ADAMW"
- LR_SCHEDULER_NAME: "WarmupPolyLR"
- BACKBONE_MULTIPLIER: 0.1
- CLIP_GRADIENTS:
- ENABLED: True
- CLIP_TYPE: "full_model"
- CLIP_VALUE: 0.01
- NORM_TYPE: 2.0
- AMP:
- ENABLED: True
-INPUT:
- MIN_SIZE_TRAIN: !!python/object/apply:eval ["[int(x * 0.1 * 1024) for x in range(5, 21)]"]
- MIN_SIZE_TRAIN_SAMPLING: "choice"
- MIN_SIZE_TEST: 1024
- MAX_SIZE_TRAIN: 4096
- MAX_SIZE_TEST: 2048
- CROP:
- ENABLED: True
- TYPE: "absolute"
- SIZE: (512, 1024)
- SINGLE_CATEGORY_MAX_AREA: 1.0
- COLOR_AUG_SSD: True
- SIZE_DIVISIBILITY: -1
- FORMAT: "RGB"
- DATASET_MAPPER_NAME: "mask_former_instance"
-TEST:
- EVAL_PERIOD: 5000
- AUG:
- ENABLED: False
- MIN_SIZES: [512, 768, 1024, 1280, 1536, 1792]
- MAX_SIZE: 4096
- FLIP: True
-DATALOADER:
- FILTER_EMPTY_ANNOTATIONS: True
- NUM_WORKERS: 4
-VERSION: 2
diff --git a/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/maskformer2_R101_bs16_90k.yaml b/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/maskformer2_R101_bs16_90k.yaml
deleted file mode 100644
index 1eb38da..0000000
--- a/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/maskformer2_R101_bs16_90k.yaml
+++ /dev/null
@@ -1,11 +0,0 @@
-_BASE_: maskformer2_R50_bs16_90k.yaml
-MODEL:
- WEIGHTS: "R-101.pkl"
- RESNETS:
- DEPTH: 101
- STEM_TYPE: "basic" # not used
- STEM_OUT_CHANNELS: 64
- STRIDE_IN_1X1: False
- OUT_FEATURES: ["res2", "res3", "res4", "res5"]
- NORM: "SyncBN"
- RES5_MULTI_GRID: [1, 1, 1] # not used
diff --git a/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/maskformer2_R50_bs16_90k.yaml b/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/maskformer2_R50_bs16_90k.yaml
deleted file mode 100644
index 16b215b..0000000
--- a/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/maskformer2_R50_bs16_90k.yaml
+++ /dev/null
@@ -1,44 +0,0 @@
-_BASE_: Base-Cityscapes-InstanceSegmentation.yaml
-MODEL:
- META_ARCHITECTURE: "MaskFormer"
- SEM_SEG_HEAD:
- NAME: "MaskFormerHead"
- IGNORE_VALUE: 255
- NUM_CLASSES: 8
- LOSS_WEIGHT: 1.0
- CONVS_DIM: 256
- MASK_DIM: 256
- NORM: "GN"
- # pixel decoder
- PIXEL_DECODER_NAME: "MSDeformAttnPixelDecoder"
- IN_FEATURES: ["res2", "res3", "res4", "res5"]
- DEFORMABLE_TRANSFORMER_ENCODER_IN_FEATURES: ["res3", "res4", "res5"]
- COMMON_STRIDE: 4
- TRANSFORMER_ENC_LAYERS: 6
- MASK_FORMER:
- TRANSFORMER_DECODER_NAME: "MultiScaleMaskedTransformerDecoder"
- TRANSFORMER_IN_FEATURE: "multi_scale_pixel_decoder"
- DEEP_SUPERVISION: True
- NO_OBJECT_WEIGHT: 0.1
- CLASS_WEIGHT: 2.0
- MASK_WEIGHT: 5.0
- DICE_WEIGHT: 5.0
- HIDDEN_DIM: 256
- NUM_OBJECT_QUERIES: 100
- NHEADS: 8
- DROPOUT: 0.0
- DIM_FEEDFORWARD: 2048
- ENC_LAYERS: 0
- PRE_NORM: False
- ENFORCE_INPUT_PROJ: False
- SIZE_DIVISIBILITY: 32
- DEC_LAYERS: 10 # 9 decoder layers, add one for the loss on learnable query
- TRAIN_NUM_POINTS: 12544
- OVERSAMPLE_RATIO: 3.0
- IMPORTANCE_SAMPLE_RATIO: 0.75
- TEST:
- SEMANTIC_ON: False
- INSTANCE_ON: True
- PANOPTIC_ON: False
- OVERLAP_THRESHOLD: 0.8
- OBJECT_MASK_THRESHOLD: 0.8
diff --git a/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/swin/maskformer2_swin_base_IN21k_384_bs16_90k.yaml b/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/swin/maskformer2_swin_base_IN21k_384_bs16_90k.yaml
deleted file mode 100644
index 2956571..0000000
--- a/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/swin/maskformer2_swin_base_IN21k_384_bs16_90k.yaml
+++ /dev/null
@@ -1,16 +0,0 @@
-_BASE_: ../maskformer2_R50_bs16_90k.yaml
-MODEL:
- BACKBONE:
- NAME: "D2SwinTransformer"
- SWIN:
- EMBED_DIM: 128
- DEPTHS: [2, 2, 18, 2]
- NUM_HEADS: [4, 8, 16, 32]
- WINDOW_SIZE: 12
- APE: False
- DROP_PATH_RATE: 0.3
- PATCH_NORM: True
- PRETRAIN_IMG_SIZE: 384
- WEIGHTS: "swin_base_patch4_window12_384_22k.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
diff --git a/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_90k.yaml b/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_90k.yaml
deleted file mode 100644
index 72860d9..0000000
--- a/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_90k.yaml
+++ /dev/null
@@ -1,18 +0,0 @@
-_BASE_: ../maskformer2_R50_bs16_90k.yaml
-MODEL:
- BACKBONE:
- NAME: "D2SwinTransformer"
- SWIN:
- EMBED_DIM: 192
- DEPTHS: [2, 2, 18, 2]
- NUM_HEADS: [6, 12, 24, 48]
- WINDOW_SIZE: 12
- APE: False
- DROP_PATH_RATE: 0.3
- PATCH_NORM: True
- PRETRAIN_IMG_SIZE: 384
- WEIGHTS: "swin_large_patch4_window12_384_22k.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
- MASK_FORMER:
- NUM_OBJECT_QUERIES: 200
diff --git a/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/swin/maskformer2_swin_small_bs16_90k.yaml b/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/swin/maskformer2_swin_small_bs16_90k.yaml
deleted file mode 100644
index 156ef9e..0000000
--- a/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/swin/maskformer2_swin_small_bs16_90k.yaml
+++ /dev/null
@@ -1,15 +0,0 @@
-_BASE_: ../maskformer2_R50_bs16_90k.yaml
-MODEL:
- BACKBONE:
- NAME: "D2SwinTransformer"
- SWIN:
- EMBED_DIM: 96
- DEPTHS: [2, 2, 18, 2]
- NUM_HEADS: [3, 6, 12, 24]
- WINDOW_SIZE: 7
- APE: False
- DROP_PATH_RATE: 0.3
- PATCH_NORM: True
- WEIGHTS: "swin_small_patch4_window7_224.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
diff --git a/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/swin/maskformer2_swin_tiny_bs16_90k.yaml b/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/swin/maskformer2_swin_tiny_bs16_90k.yaml
deleted file mode 100644
index 0c56e2c..0000000
--- a/SeMask-Mask2Former/configs/cityscapes/instance-segmentation/swin/maskformer2_swin_tiny_bs16_90k.yaml
+++ /dev/null
@@ -1,15 +0,0 @@
-_BASE_: ../maskformer2_R50_bs16_90k.yaml
-MODEL:
- BACKBONE:
- NAME: "D2SwinTransformer"
- SWIN:
- EMBED_DIM: 96
- DEPTHS: [2, 2, 6, 2]
- NUM_HEADS: [3, 6, 12, 24]
- WINDOW_SIZE: 7
- APE: False
- DROP_PATH_RATE: 0.3
- PATCH_NORM: True
- WEIGHTS: "swin_tiny_patch4_window7_224.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
diff --git a/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/Base-Cityscapes-PanopticSegmentation.yaml b/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/Base-Cityscapes-PanopticSegmentation.yaml
deleted file mode 100644
index 022567c..0000000
--- a/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/Base-Cityscapes-PanopticSegmentation.yaml
+++ /dev/null
@@ -1,61 +0,0 @@
-MODEL:
- BACKBONE:
- FREEZE_AT: 0
- NAME: "build_resnet_backbone"
- WEIGHTS: "detectron2://ImageNetPretrained/torchvision/R-50.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
- RESNETS:
- DEPTH: 50
- STEM_TYPE: "basic" # not used
- STEM_OUT_CHANNELS: 64
- STRIDE_IN_1X1: False
- OUT_FEATURES: ["res2", "res3", "res4", "res5"]
- NORM: "SyncBN" # use syncbn for cityscapes dataset
- RES5_MULTI_GRID: [1, 1, 1] # not used
-DATASETS:
- TRAIN: ("cityscapes_fine_panoptic_train",)
- TEST: ("cityscapes_fine_panoptic_val",)
-SOLVER:
- IMS_PER_BATCH: 16
- BASE_LR: 0.0001
- MAX_ITER: 90000
- WARMUP_FACTOR: 1.0
- WARMUP_ITERS: 0
- WEIGHT_DECAY: 0.05
- OPTIMIZER: "ADAMW"
- LR_SCHEDULER_NAME: "WarmupPolyLR"
- BACKBONE_MULTIPLIER: 0.1
- CLIP_GRADIENTS:
- ENABLED: True
- CLIP_TYPE: "full_model"
- CLIP_VALUE: 0.01
- NORM_TYPE: 2.0
- AMP:
- ENABLED: True
-INPUT:
- MIN_SIZE_TRAIN: !!python/object/apply:eval ["[int(x * 0.1 * 1024) for x in range(5, 21)]"]
- MIN_SIZE_TRAIN_SAMPLING: "choice"
- MIN_SIZE_TEST: 1024
- MAX_SIZE_TRAIN: 4096
- MAX_SIZE_TEST: 2048
- CROP:
- ENABLED: True
- TYPE: "absolute"
- SIZE: (512, 1024)
- SINGLE_CATEGORY_MAX_AREA: 1.0
- COLOR_AUG_SSD: True
- SIZE_DIVISIBILITY: -1
- FORMAT: "RGB"
- DATASET_MAPPER_NAME: "mask_former_panoptic"
-TEST:
- EVAL_PERIOD: 5000
- AUG:
- ENABLED: False
- MIN_SIZES: [512, 768, 1024, 1280, 1536, 1792]
- MAX_SIZE: 4096
- FLIP: True
-DATALOADER:
- FILTER_EMPTY_ANNOTATIONS: True
- NUM_WORKERS: 4
-VERSION: 2
diff --git a/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/maskformer2_R101_bs16_90k.yaml b/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/maskformer2_R101_bs16_90k.yaml
deleted file mode 100644
index 1eb38da..0000000
--- a/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/maskformer2_R101_bs16_90k.yaml
+++ /dev/null
@@ -1,11 +0,0 @@
-_BASE_: maskformer2_R50_bs16_90k.yaml
-MODEL:
- WEIGHTS: "R-101.pkl"
- RESNETS:
- DEPTH: 101
- STEM_TYPE: "basic" # not used
- STEM_OUT_CHANNELS: 64
- STRIDE_IN_1X1: False
- OUT_FEATURES: ["res2", "res3", "res4", "res5"]
- NORM: "SyncBN"
- RES5_MULTI_GRID: [1, 1, 1] # not used
diff --git a/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/maskformer2_R50_bs16_90k.yaml b/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/maskformer2_R50_bs16_90k.yaml
deleted file mode 100644
index 3c2d679..0000000
--- a/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/maskformer2_R50_bs16_90k.yaml
+++ /dev/null
@@ -1,44 +0,0 @@
-_BASE_: Base-Cityscapes-PanopticSegmentation.yaml
-MODEL:
- META_ARCHITECTURE: "MaskFormer"
- SEM_SEG_HEAD:
- NAME: "MaskFormerHead"
- IGNORE_VALUE: 255
- NUM_CLASSES: 19
- LOSS_WEIGHT: 1.0
- CONVS_DIM: 256
- MASK_DIM: 256
- NORM: "GN"
- # pixel decoder
- PIXEL_DECODER_NAME: "MSDeformAttnPixelDecoder"
- IN_FEATURES: ["res2", "res3", "res4", "res5"]
- DEFORMABLE_TRANSFORMER_ENCODER_IN_FEATURES: ["res3", "res4", "res5"]
- COMMON_STRIDE: 4
- TRANSFORMER_ENC_LAYERS: 6
- MASK_FORMER:
- TRANSFORMER_DECODER_NAME: "MultiScaleMaskedTransformerDecoder"
- TRANSFORMER_IN_FEATURE: "multi_scale_pixel_decoder"
- DEEP_SUPERVISION: True
- NO_OBJECT_WEIGHT: 0.1
- CLASS_WEIGHT: 2.0
- MASK_WEIGHT: 5.0
- DICE_WEIGHT: 5.0
- HIDDEN_DIM: 256
- NUM_OBJECT_QUERIES: 100
- NHEADS: 8
- DROPOUT: 0.0
- DIM_FEEDFORWARD: 2048
- ENC_LAYERS: 0
- PRE_NORM: False
- ENFORCE_INPUT_PROJ: False
- SIZE_DIVISIBILITY: 32
- DEC_LAYERS: 10 # 9 decoder layers, add one for the loss on learnable query
- TRAIN_NUM_POINTS: 12544
- OVERSAMPLE_RATIO: 3.0
- IMPORTANCE_SAMPLE_RATIO: 0.75
- TEST:
- SEMANTIC_ON: True
- INSTANCE_ON: True
- PANOPTIC_ON: True
- OVERLAP_THRESHOLD: 0.8
- OBJECT_MASK_THRESHOLD: 0.8
diff --git a/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/swin/maskformer2_swin_base_IN21k_384_bs16_90k.yaml b/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/swin/maskformer2_swin_base_IN21k_384_bs16_90k.yaml
deleted file mode 100644
index 2956571..0000000
--- a/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/swin/maskformer2_swin_base_IN21k_384_bs16_90k.yaml
+++ /dev/null
@@ -1,16 +0,0 @@
-_BASE_: ../maskformer2_R50_bs16_90k.yaml
-MODEL:
- BACKBONE:
- NAME: "D2SwinTransformer"
- SWIN:
- EMBED_DIM: 128
- DEPTHS: [2, 2, 18, 2]
- NUM_HEADS: [4, 8, 16, 32]
- WINDOW_SIZE: 12
- APE: False
- DROP_PATH_RATE: 0.3
- PATCH_NORM: True
- PRETRAIN_IMG_SIZE: 384
- WEIGHTS: "swin_base_patch4_window12_384_22k.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
diff --git a/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_90k.yaml b/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_90k.yaml
deleted file mode 100644
index 72860d9..0000000
--- a/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_90k.yaml
+++ /dev/null
@@ -1,18 +0,0 @@
-_BASE_: ../maskformer2_R50_bs16_90k.yaml
-MODEL:
- BACKBONE:
- NAME: "D2SwinTransformer"
- SWIN:
- EMBED_DIM: 192
- DEPTHS: [2, 2, 18, 2]
- NUM_HEADS: [6, 12, 24, 48]
- WINDOW_SIZE: 12
- APE: False
- DROP_PATH_RATE: 0.3
- PATCH_NORM: True
- PRETRAIN_IMG_SIZE: 384
- WEIGHTS: "swin_large_patch4_window12_384_22k.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
- MASK_FORMER:
- NUM_OBJECT_QUERIES: 200
diff --git a/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/swin/maskformer2_swin_small_bs16_90k.yaml b/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/swin/maskformer2_swin_small_bs16_90k.yaml
deleted file mode 100644
index 156ef9e..0000000
--- a/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/swin/maskformer2_swin_small_bs16_90k.yaml
+++ /dev/null
@@ -1,15 +0,0 @@
-_BASE_: ../maskformer2_R50_bs16_90k.yaml
-MODEL:
- BACKBONE:
- NAME: "D2SwinTransformer"
- SWIN:
- EMBED_DIM: 96
- DEPTHS: [2, 2, 18, 2]
- NUM_HEADS: [3, 6, 12, 24]
- WINDOW_SIZE: 7
- APE: False
- DROP_PATH_RATE: 0.3
- PATCH_NORM: True
- WEIGHTS: "swin_small_patch4_window7_224.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
diff --git a/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/swin/maskformer2_swin_tiny_bs16_90k.yaml b/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/swin/maskformer2_swin_tiny_bs16_90k.yaml
deleted file mode 100644
index 0c56e2c..0000000
--- a/SeMask-Mask2Former/configs/cityscapes/panoptic-segmentation/swin/maskformer2_swin_tiny_bs16_90k.yaml
+++ /dev/null
@@ -1,15 +0,0 @@
-_BASE_: ../maskformer2_R50_bs16_90k.yaml
-MODEL:
- BACKBONE:
- NAME: "D2SwinTransformer"
- SWIN:
- EMBED_DIM: 96
- DEPTHS: [2, 2, 6, 2]
- NUM_HEADS: [3, 6, 12, 24]
- WINDOW_SIZE: 7
- APE: False
- DROP_PATH_RATE: 0.3
- PATCH_NORM: True
- WEIGHTS: "swin_tiny_patch4_window7_224.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
diff --git a/SeMask-Mask2Former/configs/coco/instance-segmentation/Base-COCO-InstanceSegmentation.yaml b/SeMask-Mask2Former/configs/coco/instance-segmentation/Base-COCO-InstanceSegmentation.yaml
deleted file mode 100644
index 98943d9..0000000
--- a/SeMask-Mask2Former/configs/coco/instance-segmentation/Base-COCO-InstanceSegmentation.yaml
+++ /dev/null
@@ -1,47 +0,0 @@
-MODEL:
- BACKBONE:
- FREEZE_AT: 0
- NAME: "build_resnet_backbone"
- WEIGHTS: "detectron2://ImageNetPretrained/torchvision/R-50.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
- RESNETS:
- DEPTH: 50
- STEM_TYPE: "basic" # not used
- STEM_OUT_CHANNELS: 64
- STRIDE_IN_1X1: False
- OUT_FEATURES: ["res2", "res3", "res4", "res5"]
- # NORM: "SyncBN"
- RES5_MULTI_GRID: [1, 1, 1] # not used
-DATASETS:
- TRAIN: ("coco_2017_train",)
- TEST: ("coco_2017_val",)
-SOLVER:
- IMS_PER_BATCH: 16
- BASE_LR: 0.0001
- STEPS: (327778, 355092)
- MAX_ITER: 368750
- WARMUP_FACTOR: 1.0
- WARMUP_ITERS: 10
- WEIGHT_DECAY: 0.05
- OPTIMIZER: "ADAMW"
- BACKBONE_MULTIPLIER: 0.1
- CLIP_GRADIENTS:
- ENABLED: True
- CLIP_TYPE: "full_model"
- CLIP_VALUE: 0.01
- NORM_TYPE: 2.0
- AMP:
- ENABLED: True
-INPUT:
- IMAGE_SIZE: 1024
- MIN_SCALE: 0.1
- MAX_SCALE: 2.0
- FORMAT: "RGB"
- DATASET_MAPPER_NAME: "coco_instance_lsj"
-TEST:
- EVAL_PERIOD: 5000
-DATALOADER:
- FILTER_EMPTY_ANNOTATIONS: True
- NUM_WORKERS: 4
-VERSION: 2
diff --git a/SeMask-Mask2Former/configs/coco/instance-segmentation/maskformer2_R101_bs16_50ep.yaml b/SeMask-Mask2Former/configs/coco/instance-segmentation/maskformer2_R101_bs16_50ep.yaml
deleted file mode 100644
index 77defd0..0000000
--- a/SeMask-Mask2Former/configs/coco/instance-segmentation/maskformer2_R101_bs16_50ep.yaml
+++ /dev/null
@@ -1,11 +0,0 @@
-_BASE_: maskformer2_R50_bs16_50ep.yaml
-MODEL:
- WEIGHTS: "R-101.pkl"
- RESNETS:
- DEPTH: 101
- STEM_TYPE: "basic" # not used
- STEM_OUT_CHANNELS: 64
- STRIDE_IN_1X1: False
- OUT_FEATURES: ["res2", "res3", "res4", "res5"]
- # NORM: "SyncBN"
- RES5_MULTI_GRID: [1, 1, 1] # not used
diff --git a/SeMask-Mask2Former/configs/coco/instance-segmentation/maskformer2_R50_bs16_50ep.yaml b/SeMask-Mask2Former/configs/coco/instance-segmentation/maskformer2_R50_bs16_50ep.yaml
deleted file mode 100644
index 4b9e76e..0000000
--- a/SeMask-Mask2Former/configs/coco/instance-segmentation/maskformer2_R50_bs16_50ep.yaml
+++ /dev/null
@@ -1,44 +0,0 @@
-_BASE_: Base-COCO-InstanceSegmentation.yaml
-MODEL:
- META_ARCHITECTURE: "MaskFormer"
- SEM_SEG_HEAD:
- NAME: "MaskFormerHead"
- IGNORE_VALUE: 255
- NUM_CLASSES: 80
- LOSS_WEIGHT: 1.0
- CONVS_DIM: 256
- MASK_DIM: 256
- NORM: "GN"
- # pixel decoder
- PIXEL_DECODER_NAME: "MSDeformAttnPixelDecoder"
- IN_FEATURES: ["res2", "res3", "res4", "res5"]
- DEFORMABLE_TRANSFORMER_ENCODER_IN_FEATURES: ["res3", "res4", "res5"]
- COMMON_STRIDE: 4
- TRANSFORMER_ENC_LAYERS: 6
- MASK_FORMER:
- TRANSFORMER_DECODER_NAME: "MultiScaleMaskedTransformerDecoder"
- TRANSFORMER_IN_FEATURE: "multi_scale_pixel_decoder"
- DEEP_SUPERVISION: True
- NO_OBJECT_WEIGHT: 0.1
- CLASS_WEIGHT: 2.0
- MASK_WEIGHT: 5.0
- DICE_WEIGHT: 5.0
- HIDDEN_DIM: 256
- NUM_OBJECT_QUERIES: 100
- NHEADS: 8
- DROPOUT: 0.0
- DIM_FEEDFORWARD: 2048
- ENC_LAYERS: 0
- PRE_NORM: False
- ENFORCE_INPUT_PROJ: False
- SIZE_DIVISIBILITY: 32
- DEC_LAYERS: 10 # 9 decoder layers, add one for the loss on learnable query
- TRAIN_NUM_POINTS: 12544
- OVERSAMPLE_RATIO: 3.0
- IMPORTANCE_SAMPLE_RATIO: 0.75
- TEST:
- SEMANTIC_ON: False
- INSTANCE_ON: True
- PANOPTIC_ON: False
- OVERLAP_THRESHOLD: 0.8
- OBJECT_MASK_THRESHOLD: 0.8
diff --git a/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_base_384_bs16_50ep.yaml b/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_base_384_bs16_50ep.yaml
deleted file mode 100644
index 4732999..0000000
--- a/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_base_384_bs16_50ep.yaml
+++ /dev/null
@@ -1,16 +0,0 @@
-_BASE_: ../maskformer2_R50_bs16_50ep.yaml
-MODEL:
- BACKBONE:
- NAME: "D2SwinTransformer"
- SWIN:
- EMBED_DIM: 128
- DEPTHS: [2, 2, 18, 2]
- NUM_HEADS: [4, 8, 16, 32]
- WINDOW_SIZE: 12
- APE: False
- DROP_PATH_RATE: 0.3
- PATCH_NORM: True
- PRETRAIN_IMG_SIZE: 384
- WEIGHTS: "swin_base_patch4_window12_384.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
diff --git a/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_base_IN21k_384_bs16_50ep.yaml b/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_base_IN21k_384_bs16_50ep.yaml
deleted file mode 100644
index 5dde960..0000000
--- a/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_base_IN21k_384_bs16_50ep.yaml
+++ /dev/null
@@ -1,16 +0,0 @@
-_BASE_: ../maskformer2_R50_bs16_50ep.yaml
-MODEL:
- BACKBONE:
- NAME: "D2SwinTransformer"
- SWIN:
- EMBED_DIM: 128
- DEPTHS: [2, 2, 18, 2]
- NUM_HEADS: [4, 8, 16, 32]
- WINDOW_SIZE: 12
- APE: False
- DROP_PATH_RATE: 0.3
- PATCH_NORM: True
- PRETRAIN_IMG_SIZE: 384
- WEIGHTS: "swin_base_patch4_window12_384_22k.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
diff --git a/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_100ep.yaml b/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_100ep.yaml
deleted file mode 100644
index b685cdb..0000000
--- a/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_100ep.yaml
+++ /dev/null
@@ -1,21 +0,0 @@
-_BASE_: ../maskformer2_R50_bs16_50ep.yaml
-MODEL:
- BACKBONE:
- NAME: "D2SwinTransformer"
- SWIN:
- EMBED_DIM: 192
- DEPTHS: [2, 2, 18, 2]
- NUM_HEADS: [6, 12, 24, 48]
- WINDOW_SIZE: 12
- APE: False
- DROP_PATH_RATE: 0.3
- PATCH_NORM: True
- PRETRAIN_IMG_SIZE: 384
- WEIGHTS: "swin_large_patch4_window12_384_22k.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
- MASK_FORMER:
- NUM_OBJECT_QUERIES: 200
-SOLVER:
- STEPS: (655556, 710184)
- MAX_ITER: 737500
diff --git a/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_small_bs16_50ep.yaml b/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_small_bs16_50ep.yaml
deleted file mode 100644
index f9b1c56..0000000
--- a/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_small_bs16_50ep.yaml
+++ /dev/null
@@ -1,15 +0,0 @@
-_BASE_: ../maskformer2_R50_bs16_50ep.yaml
-MODEL:
- BACKBONE:
- NAME: "D2SwinTransformer"
- SWIN:
- EMBED_DIM: 96
- DEPTHS: [2, 2, 18, 2]
- NUM_HEADS: [3, 6, 12, 24]
- WINDOW_SIZE: 7
- APE: False
- DROP_PATH_RATE: 0.3
- PATCH_NORM: True
- WEIGHTS: "swin_small_patch4_window7_224.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
diff --git a/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_tiny_bs16_50ep.yaml b/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_tiny_bs16_50ep.yaml
deleted file mode 100644
index 7f27bc5..0000000
--- a/SeMask-Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_tiny_bs16_50ep.yaml
+++ /dev/null
@@ -1,15 +0,0 @@
-_BASE_: ../maskformer2_R50_bs16_50ep.yaml
-MODEL:
- BACKBONE:
- NAME: "D2SwinTransformer"
- SWIN:
- EMBED_DIM: 96
- DEPTHS: [2, 2, 6, 2]
- NUM_HEADS: [3, 6, 12, 24]
- WINDOW_SIZE: 7
- APE: False
- DROP_PATH_RATE: 0.3
- PATCH_NORM: True
- WEIGHTS: "swin_tiny_patch4_window7_224.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
diff --git a/SeMask-Mask2Former/configs/coco/panoptic-segmentation/Base-COCO-PanopticSegmentation.yaml b/SeMask-Mask2Former/configs/coco/panoptic-segmentation/Base-COCO-PanopticSegmentation.yaml
deleted file mode 100644
index 7560a73..0000000
--- a/SeMask-Mask2Former/configs/coco/panoptic-segmentation/Base-COCO-PanopticSegmentation.yaml
+++ /dev/null
@@ -1,47 +0,0 @@
-MODEL:
- BACKBONE:
- FREEZE_AT: 0
- NAME: "build_resnet_backbone"
- WEIGHTS: "detectron2://ImageNetPretrained/torchvision/R-50.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
- RESNETS:
- DEPTH: 50
- STEM_TYPE: "basic" # not used
- STEM_OUT_CHANNELS: 64
- STRIDE_IN_1X1: False
- OUT_FEATURES: ["res2", "res3", "res4", "res5"]
- # NORM: "SyncBN"
- RES5_MULTI_GRID: [1, 1, 1] # not used
-DATASETS:
- TRAIN: ("coco_2017_train_panoptic",)
- TEST: ("coco_2017_val_panoptic_with_sem_seg",) # to evaluate instance and semantic performance as well
-SOLVER:
- IMS_PER_BATCH: 16
- BASE_LR: 0.0001
- STEPS: (327778, 355092)
- MAX_ITER: 368750
- WARMUP_FACTOR: 1.0
- WARMUP_ITERS: 10
- WEIGHT_DECAY: 0.05
- OPTIMIZER: "ADAMW"
- BACKBONE_MULTIPLIER: 0.1
- CLIP_GRADIENTS:
- ENABLED: True
- CLIP_TYPE: "full_model"
- CLIP_VALUE: 0.01
- NORM_TYPE: 2.0
- AMP:
- ENABLED: True
-INPUT:
- IMAGE_SIZE: 1024
- MIN_SCALE: 0.1
- MAX_SCALE: 2.0
- FORMAT: "RGB"
- DATASET_MAPPER_NAME: "coco_panoptic_lsj"
-TEST:
- EVAL_PERIOD: 5000
-DATALOADER:
- FILTER_EMPTY_ANNOTATIONS: True
- NUM_WORKERS: 4
-VERSION: 2
diff --git a/SeMask-Mask2Former/configs/coco/panoptic-segmentation/maskformer2_R101_bs16_50ep.yaml b/SeMask-Mask2Former/configs/coco/panoptic-segmentation/maskformer2_R101_bs16_50ep.yaml
deleted file mode 100644
index 77defd0..0000000
--- a/SeMask-Mask2Former/configs/coco/panoptic-segmentation/maskformer2_R101_bs16_50ep.yaml
+++ /dev/null
@@ -1,11 +0,0 @@
-_BASE_: maskformer2_R50_bs16_50ep.yaml
-MODEL:
- WEIGHTS: "R-101.pkl"
- RESNETS:
- DEPTH: 101
- STEM_TYPE: "basic" # not used
- STEM_OUT_CHANNELS: 64
- STRIDE_IN_1X1: False
- OUT_FEATURES: ["res2", "res3", "res4", "res5"]
- # NORM: "SyncBN"
- RES5_MULTI_GRID: [1, 1, 1] # not used
diff --git a/SeMask-Mask2Former/configs/coco/panoptic-segmentation/maskformer2_R50_bs16_50ep.yaml b/SeMask-Mask2Former/configs/coco/panoptic-segmentation/maskformer2_R50_bs16_50ep.yaml
deleted file mode 100644
index 9ebf4f1..0000000
--- a/SeMask-Mask2Former/configs/coco/panoptic-segmentation/maskformer2_R50_bs16_50ep.yaml
+++ /dev/null
@@ -1,45 +0,0 @@
-_BASE_: Base-COCO-PanopticSegmentation.yaml
-MODEL:
- META_ARCHITECTURE: "MaskFormer"
- SEM_SEG_HEAD:
- NAME: "MaskFormerHead"
- IN_FEATURES: ["res2", "res3", "res4", "res5"]
- IGNORE_VALUE: 255
- NUM_CLASSES: 133
- LOSS_WEIGHT: 1.0
- CONVS_DIM: 256
- MASK_DIM: 256
- NORM: "GN"
- # pixel decoder
- PIXEL_DECODER_NAME: "MSDeformAttnPixelDecoder"
- IN_FEATURES: ["res2", "res3", "res4", "res5"]
- DEFORMABLE_TRANSFORMER_ENCODER_IN_FEATURES: ["res3", "res4", "res5"]
- COMMON_STRIDE: 4
- TRANSFORMER_ENC_LAYERS: 6
- MASK_FORMER:
- TRANSFORMER_DECODER_NAME: "MultiScaleMaskedTransformerDecoder"
- TRANSFORMER_IN_FEATURE: "multi_scale_pixel_decoder"
- DEEP_SUPERVISION: True
- NO_OBJECT_WEIGHT: 0.1
- CLASS_WEIGHT: 2.0
- MASK_WEIGHT: 5.0
- DICE_WEIGHT: 5.0
- HIDDEN_DIM: 256
- NUM_OBJECT_QUERIES: 100
- NHEADS: 8
- DROPOUT: 0.0
- DIM_FEEDFORWARD: 2048
- ENC_LAYERS: 0
- PRE_NORM: False
- ENFORCE_INPUT_PROJ: False
- SIZE_DIVISIBILITY: 32
- DEC_LAYERS: 10 # 9 decoder layers, add one for the loss on learnable query
- TRAIN_NUM_POINTS: 12544
- OVERSAMPLE_RATIO: 3.0
- IMPORTANCE_SAMPLE_RATIO: 0.75
- TEST:
- SEMANTIC_ON: True
- INSTANCE_ON: True
- PANOPTIC_ON: True
- OVERLAP_THRESHOLD: 0.8
- OBJECT_MASK_THRESHOLD: 0.8
diff --git a/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_base_384_bs16_50ep.yaml b/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_base_384_bs16_50ep.yaml
deleted file mode 100644
index 4732999..0000000
--- a/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_base_384_bs16_50ep.yaml
+++ /dev/null
@@ -1,16 +0,0 @@
-_BASE_: ../maskformer2_R50_bs16_50ep.yaml
-MODEL:
- BACKBONE:
- NAME: "D2SwinTransformer"
- SWIN:
- EMBED_DIM: 128
- DEPTHS: [2, 2, 18, 2]
- NUM_HEADS: [4, 8, 16, 32]
- WINDOW_SIZE: 12
- APE: False
- DROP_PATH_RATE: 0.3
- PATCH_NORM: True
- PRETRAIN_IMG_SIZE: 384
- WEIGHTS: "swin_base_patch4_window12_384.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
diff --git a/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_base_IN21k_384_bs16_50ep.yaml b/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_base_IN21k_384_bs16_50ep.yaml
deleted file mode 100644
index 5dde960..0000000
--- a/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_base_IN21k_384_bs16_50ep.yaml
+++ /dev/null
@@ -1,16 +0,0 @@
-_BASE_: ../maskformer2_R50_bs16_50ep.yaml
-MODEL:
- BACKBONE:
- NAME: "D2SwinTransformer"
- SWIN:
- EMBED_DIM: 128
- DEPTHS: [2, 2, 18, 2]
- NUM_HEADS: [4, 8, 16, 32]
- WINDOW_SIZE: 12
- APE: False
- DROP_PATH_RATE: 0.3
- PATCH_NORM: True
- PRETRAIN_IMG_SIZE: 384
- WEIGHTS: "swin_base_patch4_window12_384_22k.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
diff --git a/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_100ep.yaml b/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_100ep.yaml
deleted file mode 100644
index b685cdb..0000000
--- a/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_100ep.yaml
+++ /dev/null
@@ -1,21 +0,0 @@
-_BASE_: ../maskformer2_R50_bs16_50ep.yaml
-MODEL:
- BACKBONE:
- NAME: "D2SwinTransformer"
- SWIN:
- EMBED_DIM: 192
- DEPTHS: [2, 2, 18, 2]
- NUM_HEADS: [6, 12, 24, 48]
- WINDOW_SIZE: 12
- APE: False
- DROP_PATH_RATE: 0.3
- PATCH_NORM: True
- PRETRAIN_IMG_SIZE: 384
- WEIGHTS: "swin_large_patch4_window12_384_22k.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
- MASK_FORMER:
- NUM_OBJECT_QUERIES: 200
-SOLVER:
- STEPS: (655556, 710184)
- MAX_ITER: 737500
diff --git a/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_small_bs16_50ep.yaml b/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_small_bs16_50ep.yaml
deleted file mode 100644
index f9b1c56..0000000
--- a/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_small_bs16_50ep.yaml
+++ /dev/null
@@ -1,15 +0,0 @@
-_BASE_: ../maskformer2_R50_bs16_50ep.yaml
-MODEL:
- BACKBONE:
- NAME: "D2SwinTransformer"
- SWIN:
- EMBED_DIM: 96
- DEPTHS: [2, 2, 18, 2]
- NUM_HEADS: [3, 6, 12, 24]
- WINDOW_SIZE: 7
- APE: False
- DROP_PATH_RATE: 0.3
- PATCH_NORM: True
- WEIGHTS: "swin_small_patch4_window7_224.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
diff --git a/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_tiny_bs16_50ep.yaml b/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_tiny_bs16_50ep.yaml
deleted file mode 100644
index 7f27bc5..0000000
--- a/SeMask-Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_tiny_bs16_50ep.yaml
+++ /dev/null
@@ -1,15 +0,0 @@
-_BASE_: ../maskformer2_R50_bs16_50ep.yaml
-MODEL:
- BACKBONE:
- NAME: "D2SwinTransformer"
- SWIN:
- EMBED_DIM: 96
- DEPTHS: [2, 2, 6, 2]
- NUM_HEADS: [3, 6, 12, 24]
- WINDOW_SIZE: 7
- APE: False
- DROP_PATH_RATE: 0.3
- PATCH_NORM: True
- WEIGHTS: "swin_tiny_patch4_window7_224.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
diff --git a/SeMask-Mask2Former/configs/mapillary-vistas/panoptic-segmentation/Base-MapillaryVistas-PanopticSegmentation.yaml b/SeMask-Mask2Former/configs/mapillary-vistas/panoptic-segmentation/Base-MapillaryVistas-PanopticSegmentation.yaml
deleted file mode 100644
index 86629a3..0000000
--- a/SeMask-Mask2Former/configs/mapillary-vistas/panoptic-segmentation/Base-MapillaryVistas-PanopticSegmentation.yaml
+++ /dev/null
@@ -1,56 +0,0 @@
-MODEL:
- BACKBONE:
- FREEZE_AT: 0
- NAME: "build_resnet_backbone"
- WEIGHTS: "detectron2://ImageNetPretrained/torchvision/R-50.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
- RESNETS:
- DEPTH: 50
- STEM_TYPE: "basic" # not used
- STEM_OUT_CHANNELS: 64
- STRIDE_IN_1X1: False
- OUT_FEATURES: ["res2", "res3", "res4", "res5"]
- # NORM: "SyncBN"
- RES5_MULTI_GRID: [1, 1, 1] # not used
-DATASETS:
- TRAIN: ("mapillary_vistas_panoptic_train",)
- TEST: ("mapillary_vistas_panoptic_val",)
-SOLVER:
- IMS_PER_BATCH: 16
- BASE_LR: 0.0001
- MAX_ITER: 300000
- WARMUP_FACTOR: 1.0
- WARMUP_ITERS: 0
- WEIGHT_DECAY: 0.05
- OPTIMIZER: "ADAMW"
- LR_SCHEDULER_NAME: "WarmupPolyLR"
- BACKBONE_MULTIPLIER: 0.1
- CLIP_GRADIENTS:
- ENABLED: True
- CLIP_TYPE: "full_model"
- CLIP_VALUE: 0.01
- NORM_TYPE: 2.0
- AMP:
- ENABLED: True
-INPUT:
- MIN_SIZE_TRAIN: !!python/object/apply:eval ["[int(x * 0.1 * 2048) for x in range(5, 21)]"]
- MIN_SIZE_TRAIN_SAMPLING: "choice"
- MIN_SIZE_TEST: 2048
- MAX_SIZE_TRAIN: 8192
- MAX_SIZE_TEST: 2048
- CROP:
- ENABLED: True
- TYPE: "absolute"
- SIZE: (1024, 1024)
- SINGLE_CATEGORY_MAX_AREA: 1.0
- COLOR_AUG_SSD: True
- SIZE_DIVISIBILITY: 1024 # used in dataset mapper
- FORMAT: "RGB"
- DATASET_MAPPER_NAME: "mask_former_panoptic"
-TEST:
- EVAL_PERIOD: 0
-DATALOADER:
- FILTER_EMPTY_ANNOTATIONS: True
- NUM_WORKERS: 10
-VERSION: 2
diff --git a/SeMask-Mask2Former/configs/mapillary-vistas/panoptic-segmentation/maskformer_R50_bs16_300k.yaml b/SeMask-Mask2Former/configs/mapillary-vistas/panoptic-segmentation/maskformer_R50_bs16_300k.yaml
deleted file mode 100644
index d6a0eaa..0000000
--- a/SeMask-Mask2Former/configs/mapillary-vistas/panoptic-segmentation/maskformer_R50_bs16_300k.yaml
+++ /dev/null
@@ -1,44 +0,0 @@
-_BASE_: Base-MapillaryVistas-SemanticSegmentation.yaml
-MODEL:
- META_ARCHITECTURE: "MaskFormer"
- SEM_SEG_HEAD:
- NAME: "MaskFormerHead"
- IGNORE_VALUE: 65
- NUM_CLASSES: 65
- LOSS_WEIGHT: 1.0
- CONVS_DIM: 256
- MASK_DIM: 256
- NORM: "GN"
- # pixel decoder
- PIXEL_DECODER_NAME: "MSDeformAttnPixelDecoder"
- IN_FEATURES: ["res2", "res3", "res4", "res5"]
- DEFORMABLE_TRANSFORMER_ENCODER_IN_FEATURES: ["res3", "res4", "res5"]
- COMMON_STRIDE: 4
- TRANSFORMER_ENC_LAYERS: 6
- MASK_FORMER:
- TRANSFORMER_DECODER_NAME: "MultiScaleMaskedTransformerDecoder"
- TRANSFORMER_IN_FEATURE: "multi_scale_pixel_decoder"
- DEEP_SUPERVISION: True
- NO_OBJECT_WEIGHT: 0.1
- CLASS_WEIGHT: 2.0
- MASK_WEIGHT: 5.0
- DICE_WEIGHT: 5.0
- HIDDEN_DIM: 256
- NUM_OBJECT_QUERIES: 100
- NHEADS: 8
- DROPOUT: 0.0
- DIM_FEEDFORWARD: 2048
- ENC_LAYERS: 0
- PRE_NORM: False
- ENFORCE_INPUT_PROJ: False
- SIZE_DIVISIBILITY: 32
- DEC_LAYERS: 10 # 9 decoder layers, add one for the loss on learnable query
- TRAIN_NUM_POINTS: 12544
- OVERSAMPLE_RATIO: 3.0
- IMPORTANCE_SAMPLE_RATIO: 0.75
- TEST:
- SEMANTIC_ON: True
- INSTANCE_ON: False
- PANOPTIC_ON: True
- OVERLAP_THRESHOLD: 0.8
- OBJECT_MASK_THRESHOLD: 0.0
diff --git a/SeMask-Mask2Former/configs/mapillary-vistas/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_300k.yaml b/SeMask-Mask2Former/configs/mapillary-vistas/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_300k.yaml
deleted file mode 100644
index e7a8c4c..0000000
--- a/SeMask-Mask2Former/configs/mapillary-vistas/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_300k.yaml
+++ /dev/null
@@ -1,18 +0,0 @@
-_BASE_: ../maskformer_R50_bs16_300k.yaml
-MODEL:
- BACKBONE:
- NAME: "D2SwinTransformer"
- SWIN:
- EMBED_DIM: 192
- DEPTHS: [2, 2, 18, 2]
- NUM_HEADS: [6, 12, 24, 48]
- WINDOW_SIZE: 12
- APE: False
- DROP_PATH_RATE: 0.3
- PATCH_NORM: True
- PRETRAIN_IMG_SIZE: 384
- WEIGHTS: "swin_large_patch4_window12_384_22k.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
- MASK_FORMER:
- NUM_OBJECT_QUERIES: 200
diff --git a/SeMask-Mask2Former/configs/mapillary-vistas/semantic-segmentation/Base-MapillaryVistas-SemanticSegmentation.yaml b/SeMask-Mask2Former/configs/mapillary-vistas/semantic-segmentation/Base-MapillaryVistas-SemanticSegmentation.yaml
deleted file mode 100644
index f05fb28..0000000
--- a/SeMask-Mask2Former/configs/mapillary-vistas/semantic-segmentation/Base-MapillaryVistas-SemanticSegmentation.yaml
+++ /dev/null
@@ -1,56 +0,0 @@
-MODEL:
- BACKBONE:
- FREEZE_AT: 0
- NAME: "build_resnet_backbone"
- WEIGHTS: "detectron2://ImageNetPretrained/torchvision/R-50.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
- RESNETS:
- DEPTH: 50
- STEM_TYPE: "basic" # not used
- STEM_OUT_CHANNELS: 64
- STRIDE_IN_1X1: False
- OUT_FEATURES: ["res2", "res3", "res4", "res5"]
- # NORM: "SyncBN"
- RES5_MULTI_GRID: [1, 1, 1] # not used
-DATASETS:
- TRAIN: ("mapillary_vistas_sem_seg_train",)
- TEST: ("mapillary_vistas_sem_seg_val",)
-SOLVER:
- IMS_PER_BATCH: 16
- BASE_LR: 0.0001
- MAX_ITER: 300000
- WARMUP_FACTOR: 1.0
- WARMUP_ITERS: 0
- WEIGHT_DECAY: 0.05
- OPTIMIZER: "ADAMW"
- LR_SCHEDULER_NAME: "WarmupPolyLR"
- BACKBONE_MULTIPLIER: 0.1
- CLIP_GRADIENTS:
- ENABLED: True
- CLIP_TYPE: "full_model"
- CLIP_VALUE: 0.01
- NORM_TYPE: 2.0
- AMP:
- ENABLED: True
-INPUT:
- MIN_SIZE_TRAIN: !!python/object/apply:eval ["[int(x * 0.1 * 2048) for x in range(5, 21)]"]
- MIN_SIZE_TRAIN_SAMPLING: "choice"
- MIN_SIZE_TEST: 2048
- MAX_SIZE_TRAIN: 8192
- MAX_SIZE_TEST: 2048
- CROP:
- ENABLED: True
- TYPE: "absolute"
- SIZE: (1024, 1024)
- SINGLE_CATEGORY_MAX_AREA: 1.0
- COLOR_AUG_SSD: True
- SIZE_DIVISIBILITY: 1024 # used in dataset mapper
- FORMAT: "RGB"
- DATASET_MAPPER_NAME: "mask_former_semantic"
-TEST:
- EVAL_PERIOD: 0
-DATALOADER:
- FILTER_EMPTY_ANNOTATIONS: True
- NUM_WORKERS: 10
-VERSION: 2
diff --git a/SeMask-Mask2Former/configs/mapillary-vistas/semantic-segmentation/maskformer2_R50_bs16_300k.yaml b/SeMask-Mask2Former/configs/mapillary-vistas/semantic-segmentation/maskformer2_R50_bs16_300k.yaml
deleted file mode 100644
index e9977a1..0000000
--- a/SeMask-Mask2Former/configs/mapillary-vistas/semantic-segmentation/maskformer2_R50_bs16_300k.yaml
+++ /dev/null
@@ -1,44 +0,0 @@
-_BASE_: Base-MapillaryVistas-SemanticSegmentation.yaml
-MODEL:
- META_ARCHITECTURE: "MaskFormer"
- SEM_SEG_HEAD:
- NAME: "MaskFormerHead"
- IGNORE_VALUE: 65
- NUM_CLASSES: 65
- LOSS_WEIGHT: 1.0
- CONVS_DIM: 256
- MASK_DIM: 256
- NORM: "GN"
- # pixel decoder
- PIXEL_DECODER_NAME: "MSDeformAttnPixelDecoder"
- IN_FEATURES: ["res2", "res3", "res4", "res5"]
- DEFORMABLE_TRANSFORMER_ENCODER_IN_FEATURES: ["res3", "res4", "res5"]
- COMMON_STRIDE: 4
- TRANSFORMER_ENC_LAYERS: 6
- MASK_FORMER:
- TRANSFORMER_DECODER_NAME: "MultiScaleMaskedTransformerDecoder"
- TRANSFORMER_IN_FEATURE: "multi_scale_pixel_decoder"
- DEEP_SUPERVISION: True
- NO_OBJECT_WEIGHT: 0.1
- CLASS_WEIGHT: 2.0
- MASK_WEIGHT: 5.0
- DICE_WEIGHT: 5.0
- HIDDEN_DIM: 256
- NUM_OBJECT_QUERIES: 100
- NHEADS: 8
- DROPOUT: 0.0
- DIM_FEEDFORWARD: 2048
- ENC_LAYERS: 0
- PRE_NORM: False
- ENFORCE_INPUT_PROJ: False
- SIZE_DIVISIBILITY: 32
- DEC_LAYERS: 10 # 9 decoder layers, add one for the loss on learnable query
- TRAIN_NUM_POINTS: 12544
- OVERSAMPLE_RATIO: 3.0
- IMPORTANCE_SAMPLE_RATIO: 0.75
- TEST:
- SEMANTIC_ON: True
- INSTANCE_ON: False
- PANOPTIC_ON: False
- OVERLAP_THRESHOLD: 0.8
- OBJECT_MASK_THRESHOLD: 0.0
diff --git a/SeMask-Mask2Former/configs/mapillary-vistas/semantic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_300k.yaml b/SeMask-Mask2Former/configs/mapillary-vistas/semantic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_300k.yaml
deleted file mode 100644
index e336a1b..0000000
--- a/SeMask-Mask2Former/configs/mapillary-vistas/semantic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_300k.yaml
+++ /dev/null
@@ -1,18 +0,0 @@
-_BASE_: ../maskformer2_R50_bs16_300k.yaml
-MODEL:
- BACKBONE:
- NAME: "D2SwinTransformer"
- SWIN:
- EMBED_DIM: 192
- DEPTHS: [2, 2, 18, 2]
- NUM_HEADS: [6, 12, 24, 48]
- WINDOW_SIZE: 12
- APE: False
- DROP_PATH_RATE: 0.3
- PATCH_NORM: True
- PRETRAIN_IMG_SIZE: 384
- WEIGHTS: "swin_large_patch4_window12_384_22k.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
- MASK_FORMER:
- NUM_OBJECT_QUERIES: 100
diff --git a/SeMask-MaskFormer/GETTING_STARTED.md b/SeMask-MaskFormer/GETTING_STARTED.md
index 828689c..2fc3962 100644
--- a/SeMask-MaskFormer/GETTING_STARTED.md
+++ b/SeMask-MaskFormer/GETTING_STARTED.md
@@ -9,11 +9,11 @@ Please see [Getting Started with Detectron2](https://github.com/facebookresearch
1. Pick a model and its config file from
[model zoo](MODEL_ZOO.md),
- for example, `ade20k-150/maskformer_R50_bs16_160k.yaml`.
+ for example, `ade20k-150/semask_swin/maskformer_semask_swin_large_IN21k_384_bs16_160k_res640.yaml`.
2. We provide `demo.py` that is able to demo builtin configs. Run it with:
```
cd demo/
-python demo.py --config-file ../configs/ade20k-150/maskformer_R50_bs16_160k.yaml \
+python demo.py --config-file ../configs/ade20k-150/semask_swin/maskformer_semask_swin_large_IN21k_384_bs16_160k_res640.yaml \
--input input1.jpg input2.jpg \
[--other-options]
--opts MODEL.WEIGHTS /path/to/checkpoint_file
@@ -39,7 +39,7 @@ setup the corresponding datasets following
then run:
```
./train_net.py --num-gpus 8 \
- --config-file configs/ade20k-150/maskformer_R50_bs16_160k.yaml
+ --config-file configs/ade20k-150/semask_swin/maskformer_semask_swin_large_IN21k_384_bs16_160k_res640.yaml
```
The configs are made for 8-GPU training.
@@ -47,14 +47,14 @@ Since we use ADAMW optimizer, it is not clear how to scale learning rate with ba
To train on 1 GPU, you need to figure out learning rate and batch size by yourself:
```
./train_net.py \
- --config-file configs/ade20k-150/maskformer_R50_bs16_160k.yaml \
+ --config-file configs/ade20k-150/semask_swin/maskformer_semask_swin_large_IN21k_384_bs16_160k_res640.yaml \
--num-gpus 1 SOLVER.IMS_PER_BATCH SET_TO_SOME_REASONABLE_VALUE SOLVER.BASE_LR SET_TO_SOME_REASONABLE_VALUE
```
To evaluate a model's performance, use
```
./train_net.py \
- --config-file configs/ade20k-150/maskformer_R50_bs16_160k.yaml \
+ --config-file configs/ade20k-150/semask_swin/maskformer_semask_swin_large_IN21k_384_bs16_160k_res640.yaml \
--eval-only MODEL.WEIGHTS /path/to/checkpoint_file
```
For more options, see `./train_net.py -h`.
diff --git a/SeMask-MaskFormer/configs/ade20k-150-panoptic/maskformer_panoptic_R101_bs16_720k.yaml b/SeMask-MaskFormer/configs/ade20k-150-panoptic/maskformer_panoptic_R101_bs16_720k.yaml
deleted file mode 100644
index c280b8f..0000000
--- a/SeMask-MaskFormer/configs/ade20k-150-panoptic/maskformer_panoptic_R101_bs16_720k.yaml
+++ /dev/null
@@ -1,11 +0,0 @@
-_BASE_: maskformer_panoptic_R50_bs16_720k.yaml
-MODEL:
- WEIGHTS: "R-101.pkl"
- RESNETS:
- DEPTH: 101
- STEM_TYPE: "basic" # not used
- STEM_OUT_CHANNELS: 64
- STRIDE_IN_1X1: False
- OUT_FEATURES: ["res2", "res3", "res4", "res5"]
- # NORM: "SyncBN"
- RES5_MULTI_GRID: [1, 1, 1] # not used
diff --git a/SeMask-MaskFormer/configs/ade20k-150-panoptic/maskformer_panoptic_R50_bs16_720k.yaml b/SeMask-MaskFormer/configs/ade20k-150-panoptic/maskformer_panoptic_R50_bs16_720k.yaml
deleted file mode 100644
index 0be2839..0000000
--- a/SeMask-MaskFormer/configs/ade20k-150-panoptic/maskformer_panoptic_R50_bs16_720k.yaml
+++ /dev/null
@@ -1,33 +0,0 @@
-_BASE_: ../ade20k-150/maskformer_R50_bs16_160k.yaml
-MODEL:
- SEM_SEG_HEAD:
- PIXEL_DECODER_NAME: "TransformerEncoderPixelDecoder"
- TRANSFORMER_ENC_LAYERS: 6
- MASK_FORMER:
- TRANSFORMER_IN_FEATURE: "transformer_encoder"
- TEST:
- PANOPTIC_ON: True
- OVERLAP_THRESHOLD: 0.8
- OBJECT_MASK_THRESHOLD: 0.7
-DATASETS:
- TRAIN: ("ade20k_panoptic_train",)
- TEST: ("ade20k_panoptic_val",)
-SOLVER:
- MAX_ITER: 720000
-INPUT:
- MIN_SIZE_TRAIN: !!python/object/apply:eval ["[int(x * 0.1 * 640) for x in range(5, 21)]"]
- MIN_SIZE_TRAIN_SAMPLING: "choice"
- MIN_SIZE_TEST: 640
- MAX_SIZE_TRAIN: 2560
- MAX_SIZE_TEST: 2560
- CROP:
- ENABLED: True
- TYPE: "absolute"
- SIZE: (640, 640)
- SINGLE_CATEGORY_MAX_AREA: 1.0
- COLOR_AUG_SSD: True
- SIZE_DIVISIBILITY: 640 # used in dataset mapper
- FORMAT: "RGB"
- DATASET_MAPPER_NAME: "mask_former_panoptic"
-TEST:
- EVAL_PERIOD: 0
diff --git a/SeMask-MaskFormer/configs/ade20k-full-847/Base-ADE20KFull-847.yaml b/SeMask-MaskFormer/configs/ade20k-full-847/Base-ADE20KFull-847.yaml
deleted file mode 100644
index e3cd338..0000000
--- a/SeMask-MaskFormer/configs/ade20k-full-847/Base-ADE20KFull-847.yaml
+++ /dev/null
@@ -1,54 +0,0 @@
-MODEL:
- BACKBONE:
- FREEZE_AT: 0
- NAME: "build_resnet_backbone"
- WEIGHTS: "detectron2://ImageNetPretrained/torchvision/R-50.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
- RESNETS:
- DEPTH: 50
- STEM_TYPE: "basic" # not used
- STEM_OUT_CHANNELS: 64
- STRIDE_IN_1X1: False
- OUT_FEATURES: ["res2", "res3", "res4", "res5"]
- # NORM: "SyncBN"
- RES5_MULTI_GRID: [1, 1, 1] # not used
-DATASETS:
- TRAIN: ("ade20k_full_sem_seg_train",)
- TEST: ("ade20k_full_sem_seg_val",)
-SOLVER:
- IMS_PER_BATCH: 16
- BASE_LR: 0.0001
- MAX_ITER: 200000
- WARMUP_FACTOR: 1.0
- WARMUP_ITERS: 0
- WEIGHT_DECAY: 0.0001
- OPTIMIZER: "ADAMW"
- LR_SCHEDULER_NAME: "WarmupPolyLR"
- BACKBONE_MULTIPLIER: 0.1
- CLIP_GRADIENTS:
- ENABLED: True
- CLIP_TYPE: "full_model"
- CLIP_VALUE: 0.01
- NORM_TYPE: 2.0
-INPUT:
- MIN_SIZE_TRAIN: !!python/object/apply:eval ["[int(x * 0.1 * 512) for x in range(5, 21)]"]
- MIN_SIZE_TRAIN_SAMPLING: "choice"
- MIN_SIZE_TEST: 512
- MAX_SIZE_TRAIN: 2048
- MAX_SIZE_TEST: 2048
- CROP:
- ENABLED: True
- TYPE: "absolute"
- SIZE: (512, 512)
- SINGLE_CATEGORY_MAX_AREA: 1.0
- COLOR_AUG_SSD: True
- SIZE_DIVISIBILITY: 512 # used in dataset mapper
- FORMAT: "RGB"
- DATASET_MAPPER_NAME: "mask_former_semantic"
-TEST:
- EVAL_PERIOD: 5000
-DATALOADER:
- FILTER_EMPTY_ANNOTATIONS: True
- NUM_WORKERS: 4
-VERSION: 2
diff --git a/SeMask-MaskFormer/configs/ade20k-full-847/maskformer_R101_bs16_200k.yaml b/SeMask-MaskFormer/configs/ade20k-full-847/maskformer_R101_bs16_200k.yaml
deleted file mode 100644
index 484c437..0000000
--- a/SeMask-MaskFormer/configs/ade20k-full-847/maskformer_R101_bs16_200k.yaml
+++ /dev/null
@@ -1,11 +0,0 @@
-_BASE_: maskformer_R50_bs16_200k.yaml
-MODEL:
- WEIGHTS: "R-101.pkl"
- RESNETS:
- DEPTH: 101
- STEM_TYPE: "basic" # not used
- STEM_OUT_CHANNELS: 64
- STRIDE_IN_1X1: False
- OUT_FEATURES: ["res2", "res3", "res4", "res5"]
- # NORM: "SyncBN"
- RES5_MULTI_GRID: [1, 1, 1] # not used
diff --git a/SeMask-MaskFormer/configs/ade20k-full-847/maskformer_R101c_bs16_200k.yaml b/SeMask-MaskFormer/configs/ade20k-full-847/maskformer_R101c_bs16_200k.yaml
deleted file mode 100644
index 3a802c5..0000000
--- a/SeMask-MaskFormer/configs/ade20k-full-847/maskformer_R101c_bs16_200k.yaml
+++ /dev/null
@@ -1,13 +0,0 @@
-_BASE_: maskformer_R50_bs16_200k.yaml
-MODEL:
- BACKBONE:
- NAME: "build_resnet_deeplab_backbone"
- WEIGHTS: "detectron2://DeepLab/R-103.pkl"
- RESNETS:
- DEPTH: 101
- STEM_TYPE: "deeplab"
- STEM_OUT_CHANNELS: 128
- STRIDE_IN_1X1: False
- OUT_FEATURES: ["res2", "res3", "res4", "res5"]
- # NORM: "SyncBN"
- RES5_MULTI_GRID: [1, 2, 4]
diff --git a/SeMask-MaskFormer/configs/ade20k-full-847/maskformer_R50_bs16_200k.yaml b/SeMask-MaskFormer/configs/ade20k-full-847/maskformer_R50_bs16_200k.yaml
deleted file mode 100644
index 430adaa..0000000
--- a/SeMask-MaskFormer/configs/ade20k-full-847/maskformer_R50_bs16_200k.yaml
+++ /dev/null
@@ -1,27 +0,0 @@
-_BASE_: Base-ADE20KFull-847.yaml
-MODEL:
- META_ARCHITECTURE: "MaskFormer"
- SEM_SEG_HEAD:
- NAME: "MaskFormerHead"
- IN_FEATURES: ["res2", "res3", "res4", "res5"]
- IGNORE_VALUE: 65535
- NUM_CLASSES: 847
- COMMON_STRIDE: 4 # not used, hard-coded
- LOSS_WEIGHT: 1.0
- CONVS_DIM: 256
- MASK_DIM: 256
- NORM: "GN"
- MASK_FORMER:
- TRANSFORMER_IN_FEATURE: "res5"
- DEEP_SUPERVISION: True
- NO_OBJECT_WEIGHT: 0.1
- DICE_WEIGHT: 1.0
- MASK_WEIGHT: 20.0
- HIDDEN_DIM: 256
- NUM_OBJECT_QUERIES: 100
- NHEADS: 8
- DROPOUT: 0.1
- DIM_FEEDFORWARD: 2048
- ENC_LAYERS: 0
- DEC_LAYERS: 6
- PRE_NORM: False
diff --git a/SeMask-MaskFormer/configs/ade20k-full-847/per_pixel_baseline_R50_bs16_200k.yaml b/SeMask-MaskFormer/configs/ade20k-full-847/per_pixel_baseline_R50_bs16_200k.yaml
deleted file mode 100644
index 8323067..0000000
--- a/SeMask-MaskFormer/configs/ade20k-full-847/per_pixel_baseline_R50_bs16_200k.yaml
+++ /dev/null
@@ -1,13 +0,0 @@
-_BASE_: Base-ADE20KFull-847.yaml
-MODEL:
- META_ARCHITECTURE: "SemanticSegmentor"
- SEM_SEG_HEAD:
- NAME: "PerPixelBaselineHead"
- IN_FEATURES: ["res2", "res3", "res4", "res5"]
- IGNORE_VALUE: 65535
- NUM_CLASSES: 847
- COMMON_STRIDE: 4 # not used, hard-coded
- LOSS_WEIGHT: 1.0
- CONVS_DIM: 256
- MASK_DIM: 256
- NORM: "GN"
diff --git a/SeMask-MaskFormer/configs/ade20k-full-847/per_pixel_baseline_plus_R50_bs16_200k.yaml b/SeMask-MaskFormer/configs/ade20k-full-847/per_pixel_baseline_plus_R50_bs16_200k.yaml
deleted file mode 100644
index b30d455..0000000
--- a/SeMask-MaskFormer/configs/ade20k-full-847/per_pixel_baseline_plus_R50_bs16_200k.yaml
+++ /dev/null
@@ -1,24 +0,0 @@
-_BASE_: Base-ADE20KFull-847.yaml
-MODEL:
- META_ARCHITECTURE: "SemanticSegmentor"
- SEM_SEG_HEAD:
- NAME: "PerPixelBaselinePlusHead"
- IN_FEATURES: ["res2", "res3", "res4", "res5"]
- IGNORE_VALUE: 65535
- NUM_CLASSES: 847
- COMMON_STRIDE: 4 # not used, hard-coded
- LOSS_WEIGHT: 1.0
- CONVS_DIM: 256
- MASK_DIM: 256
- NORM: "GN"
- MASK_FORMER:
- TRANSFORMER_IN_FEATURE: "res5"
- DEEP_SUPERVISION: True
- HIDDEN_DIM: 256
- NUM_OBJECT_QUERIES: 847 # remember to set this to NUM_CLASSES
- NHEADS: 8
- DROPOUT: 0.1
- DIM_FEEDFORWARD: 2048
- ENC_LAYERS: 0
- DEC_LAYERS: 6
- PRE_NORM: False
diff --git a/SeMask-MaskFormer/configs/cityscapes-19/Base-Cityscapes-19.yaml b/SeMask-MaskFormer/configs/cityscapes-19/Base-Cityscapes-19.yaml
deleted file mode 100644
index 6b52542..0000000
--- a/SeMask-MaskFormer/configs/cityscapes-19/Base-Cityscapes-19.yaml
+++ /dev/null
@@ -1,59 +0,0 @@
-MODEL:
- BACKBONE:
- FREEZE_AT: 0
- NAME: "build_resnet_backbone"
- WEIGHTS: "detectron2://ImageNetPretrained/torchvision/R-50.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
- RESNETS:
- DEPTH: 50
- STEM_TYPE: "basic" # not used
- STEM_OUT_CHANNELS: 64
- STRIDE_IN_1X1: False
- OUT_FEATURES: ["res2", "res3", "res4", "res5"]
- # NORM: "SyncBN"
- RES5_MULTI_GRID: [1, 1, 1] # not used
-DATASETS:
- TRAIN: ("cityscapes_fine_sem_seg_train",)
- TEST: ("cityscapes_fine_sem_seg_val",)
-SOLVER:
- IMS_PER_BATCH: 16
- BASE_LR: 0.0001
- MAX_ITER: 90000
- WARMUP_FACTOR: 1.0
- WARMUP_ITERS: 0
- WEIGHT_DECAY: 0.0001
- OPTIMIZER: "ADAMW"
- LR_SCHEDULER_NAME: "WarmupPolyLR"
- BACKBONE_MULTIPLIER: 0.1
- CLIP_GRADIENTS:
- ENABLED: True
- CLIP_TYPE: "full_model"
- CLIP_VALUE: 0.01
- NORM_TYPE: 2.0
-INPUT:
- MIN_SIZE_TRAIN: !!python/object/apply:eval ["[int(x * 0.1 * 1024) for x in range(5, 21)]"]
- MIN_SIZE_TRAIN_SAMPLING: "choice"
- MIN_SIZE_TEST: 1024
- MAX_SIZE_TRAIN: 4096
- MAX_SIZE_TEST: 2048
- CROP:
- ENABLED: True
- TYPE: "absolute"
- SIZE: (512, 1024)
- SINGLE_CATEGORY_MAX_AREA: 1.0
- COLOR_AUG_SSD: True
- SIZE_DIVISIBILITY: -1
- FORMAT: "RGB"
- DATASET_MAPPER_NAME: "mask_former_semantic"
-TEST:
- EVAL_PERIOD: 5000
- AUG:
- ENABLED: False
- MIN_SIZES: [512, 768, 1024, 1280, 1536, 1792]
- MAX_SIZE: 4096
- FLIP: True
-DATALOADER:
- FILTER_EMPTY_ANNOTATIONS: True
- NUM_WORKERS: 4
-VERSION: 2
diff --git a/SeMask-MaskFormer/configs/cityscapes-19/maskformer_R101_bs16_90k.yaml b/SeMask-MaskFormer/configs/cityscapes-19/maskformer_R101_bs16_90k.yaml
deleted file mode 100644
index e1017a7..0000000
--- a/SeMask-MaskFormer/configs/cityscapes-19/maskformer_R101_bs16_90k.yaml
+++ /dev/null
@@ -1,36 +0,0 @@
-_BASE_: Base-Cityscapes-19.yaml
-MODEL:
- WEIGHTS: "R-101.pkl"
- RESNETS:
- DEPTH: 101
- STEM_TYPE: "basic" # not used
- STEM_OUT_CHANNELS: 64
- STRIDE_IN_1X1: False
- OUT_FEATURES: ["res2", "res3", "res4", "res5"]
- # NORM: "SyncBN"
- RES5_MULTI_GRID: [1, 1, 1] # not used
- META_ARCHITECTURE: "MaskFormer"
- SEM_SEG_HEAD:
- NAME: "MaskFormerHead"
- IN_FEATURES: ["res2", "res3", "res4", "res5"]
- IGNORE_VALUE: 255
- NUM_CLASSES: 19
- COMMON_STRIDE: 4 # not used, hard-coded
- LOSS_WEIGHT: 1.0
- CONVS_DIM: 256
- MASK_DIM: 256
- NORM: "GN"
- MASK_FORMER:
- TRANSFORMER_IN_FEATURE: "res5"
- DEEP_SUPERVISION: True
- NO_OBJECT_WEIGHT: 0.1
- DICE_WEIGHT: 1.0
- MASK_WEIGHT: 20.0
- HIDDEN_DIM: 256
- NUM_OBJECT_QUERIES: 100
- NHEADS: 8
- DROPOUT: 0.1
- DIM_FEEDFORWARD: 2048
- ENC_LAYERS: 0
- DEC_LAYERS: 6
- PRE_NORM: False
diff --git a/SeMask-MaskFormer/configs/cityscapes-19/maskformer_R101c_bs16_90k.yaml b/SeMask-MaskFormer/configs/cityscapes-19/maskformer_R101c_bs16_90k.yaml
deleted file mode 100644
index e07bbee..0000000
--- a/SeMask-MaskFormer/configs/cityscapes-19/maskformer_R101c_bs16_90k.yaml
+++ /dev/null
@@ -1,16 +0,0 @@
-_BASE_: maskformer_R101_bs16_90k.yaml
-MODEL:
- BACKBONE:
- FREEZE_AT: 0
- NAME: "build_resnet_deeplab_backbone"
- WEIGHTS: "detectron2://DeepLab/R-103.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
- RESNETS:
- DEPTH: 101
- STEM_TYPE: "deeplab"
- STEM_OUT_CHANNELS: 128
- STRIDE_IN_1X1: False
- OUT_FEATURES: ["res2", "res3", "res4", "res5"]
- # NORM: "SyncBN"
- RES5_MULTI_GRID: [1, 2, 4]
diff --git a/SeMask-MaskFormer/configs/coco-panoptic/Base-COCO-PanopticSegmentation.yaml b/SeMask-MaskFormer/configs/coco-panoptic/Base-COCO-PanopticSegmentation.yaml
deleted file mode 100644
index 53f3772..0000000
--- a/SeMask-MaskFormer/configs/coco-panoptic/Base-COCO-PanopticSegmentation.yaml
+++ /dev/null
@@ -1,47 +0,0 @@
-MODEL:
- BACKBONE:
- FREEZE_AT: 0
- NAME: "build_resnet_backbone"
- WEIGHTS: "detectron2://ImageNetPretrained/torchvision/R-50.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
- RESNETS:
- DEPTH: 50
- STEM_TYPE: "basic" # not used
- STEM_OUT_CHANNELS: 64
- STRIDE_IN_1X1: False
- OUT_FEATURES: ["res2", "res3", "res4", "res5"]
- # NORM: "SyncBN"
- RES5_MULTI_GRID: [1, 1, 1] # not used
-DATASETS:
- TRAIN: ("coco_2017_train_panoptic",)
- TEST: ("coco_2017_val_panoptic",)
-SOLVER:
- IMS_PER_BATCH: 64
- BASE_LR: 0.0001
- STEPS: (369600,)
- MAX_ITER: 554400
- WARMUP_FACTOR: 1.0
- WARMUP_ITERS: 10
- WEIGHT_DECAY: 0.0001
- OPTIMIZER: "ADAMW"
- BACKBONE_MULTIPLIER: 0.1
- CLIP_GRADIENTS:
- ENABLED: True
- CLIP_TYPE: "full_model"
- CLIP_VALUE: 0.01
- NORM_TYPE: 2.0
-INPUT:
- MIN_SIZE_TRAIN: (480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800)
- CROP:
- ENABLED: True
- TYPE: "absolute_range"
- SIZE: (384, 600)
- FORMAT: "RGB"
- DATASET_MAPPER_NAME: "detr_panoptic"
-TEST:
- EVAL_PERIOD: 0
-DATALOADER:
- FILTER_EMPTY_ANNOTATIONS: True
- NUM_WORKERS: 4
-VERSION: 2
diff --git a/SeMask-MaskFormer/configs/coco-panoptic/maskformer_panoptic_R101_bs64_554k.yaml b/SeMask-MaskFormer/configs/coco-panoptic/maskformer_panoptic_R101_bs64_554k.yaml
deleted file mode 100644
index b2bf3b6..0000000
--- a/SeMask-MaskFormer/configs/coco-panoptic/maskformer_panoptic_R101_bs64_554k.yaml
+++ /dev/null
@@ -1,11 +0,0 @@
-_BASE_: maskformer_panoptic_R50_bs64_554k.yaml
-MODEL:
- WEIGHTS: "R-101.pkl"
- RESNETS:
- DEPTH: 101
- STEM_TYPE: "basic" # not used
- STEM_OUT_CHANNELS: 64
- STRIDE_IN_1X1: False
- OUT_FEATURES: ["res2", "res3", "res4", "res5"]
- # NORM: "SyncBN"
- RES5_MULTI_GRID: [1, 1, 1] # not used
diff --git a/SeMask-MaskFormer/configs/coco-panoptic/maskformer_panoptic_R50_bs64_554k.yaml b/SeMask-MaskFormer/configs/coco-panoptic/maskformer_panoptic_R50_bs64_554k.yaml
deleted file mode 100644
index 2375c17..0000000
--- a/SeMask-MaskFormer/configs/coco-panoptic/maskformer_panoptic_R50_bs64_554k.yaml
+++ /dev/null
@@ -1,36 +0,0 @@
-_BASE_: Base-COCO-PanopticSegmentation.yaml
-MODEL:
- META_ARCHITECTURE: "MaskFormer"
- SEM_SEG_HEAD:
- NAME: "MaskFormerHead"
- IN_FEATURES: ["res2", "res3", "res4", "res5"]
- IGNORE_VALUE: 255
- NUM_CLASSES: 133
- COMMON_STRIDE: 4 # not used, hard-coded
- LOSS_WEIGHT: 1.0
- CONVS_DIM: 256
- MASK_DIM: 256
- NORM: "GN"
- # add additional 6 encoder layers
- PIXEL_DECODER_NAME: "TransformerEncoderPixelDecoder"
- TRANSFORMER_ENC_LAYERS: 6
- MASK_FORMER:
- TRANSFORMER_IN_FEATURE: "transformer_encoder"
- DEEP_SUPERVISION: True
- NO_OBJECT_WEIGHT: 0.1
- DICE_WEIGHT: 1.0
- MASK_WEIGHT: 20.0
- HIDDEN_DIM: 256
- NUM_OBJECT_QUERIES: 100
- NHEADS: 8
- DROPOUT: 0.1
- DIM_FEEDFORWARD: 2048
- ENC_LAYERS: 0
- DEC_LAYERS: 6
- PRE_NORM: False
- # COCO model should not pad image
- SIZE_DIVISIBILITY: 0
- TEST:
- PANOPTIC_ON: True
- OVERLAP_THRESHOLD: 0.8
- OBJECT_MASK_THRESHOLD: 0.8
diff --git a/SeMask-MaskFormer/configs/coco-panoptic/swin/maskformer_panoptic_swin_base_IN21k_384_bs64_554k.yaml b/SeMask-MaskFormer/configs/coco-panoptic/swin/maskformer_panoptic_swin_base_IN21k_384_bs64_554k.yaml
deleted file mode 100644
index 526c74b..0000000
--- a/SeMask-MaskFormer/configs/coco-panoptic/swin/maskformer_panoptic_swin_base_IN21k_384_bs64_554k.yaml
+++ /dev/null
@@ -1,33 +0,0 @@
-_BASE_: ../maskformer_panoptic_R50_bs64_554k.yaml
-MODEL:
- BACKBONE:
- NAME: "D2SwinTransformer"
- SWIN:
- EMBED_DIM: 128
- DEPTHS: [2, 2, 18, 2]
- NUM_HEADS: [4, 8, 16, 32]
- WINDOW_SIZE: 12
- APE: False
- DROP_PATH_RATE: 0.3
- PATCH_NORM: True
- PRETRAIN_IMG_SIZE: 384
- WEIGHTS: "swin_base_patch4_window12_384_22k.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
- SEM_SEG_HEAD:
- PIXEL_DECODER_NAME: "BasePixelDecoder"
- MASK_FORMER:
- TRANSFORMER_IN_FEATURE: "res5"
- ENFORCE_INPUT_PROJ: True
- TEST:
- PANOPTIC_ON: True
- OVERLAP_THRESHOLD: 0.8
- OBJECT_MASK_THRESHOLD: 0.8
-SOLVER:
- BASE_LR: 0.00006
- WARMUP_FACTOR: 1e-6
- WARMUP_ITERS: 1500
- WEIGHT_DECAY: 0.01
- WEIGHT_DECAY_NORM: 0.0
- WEIGHT_DECAY_EMBED: 0.0
- BACKBONE_MULTIPLIER: 1.0
\ No newline at end of file
diff --git a/SeMask-MaskFormer/configs/coco-panoptic/swin/maskformer_panoptic_swin_large_IN21k_384_bs64_554k.yaml b/SeMask-MaskFormer/configs/coco-panoptic/swin/maskformer_panoptic_swin_large_IN21k_384_bs64_554k.yaml
deleted file mode 100644
index a8c8833..0000000
--- a/SeMask-MaskFormer/configs/coco-panoptic/swin/maskformer_panoptic_swin_large_IN21k_384_bs64_554k.yaml
+++ /dev/null
@@ -1,41 +0,0 @@
-_BASE_: ../maskformer_panoptic_R50_bs64_554k.yaml
-MODEL:
- BACKBONE:
- NAME: "D2SwinTransformer"
- SWIN:
- EMBED_DIM: 192
- DEPTHS: [2, 2, 18, 2]
- NUM_HEADS: [6, 12, 24, 48]
- WINDOW_SIZE: 12
- APE: False
- DROP_PATH_RATE: 0.3
- PATCH_NORM: True
- PRETRAIN_IMG_SIZE: 384
- WEIGHTS: "swin_large_patch4_window12_384_22k.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
- SEM_SEG_HEAD:
- PIXEL_DECODER_NAME: "BasePixelDecoder"
- MASK_FORMER:
- TRANSFORMER_IN_FEATURE: "res5"
- ENFORCE_INPUT_PROJ: True
- TEST:
- PANOPTIC_ON: True
- OVERLAP_THRESHOLD: 0.8
- OBJECT_MASK_THRESHOLD: 0.8
-SOLVER:
- BASE_LR: 0.00006
- WARMUP_FACTOR: 1e-6
- WARMUP_ITERS: 1500
- WEIGHT_DECAY: 0.01
- WEIGHT_DECAY_NORM: 0.0
- WEIGHT_DECAY_EMBED: 0.0
- BACKBONE_MULTIPLIER: 1.0
-INPUT:
- MIN_SIZE_TRAIN: (480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800)
- MAX_SIZE_TRAIN: 1000
- CROP:
- ENABLED: True
- TYPE: "absolute_range"
- SIZE: (384, 600)
- FORMAT: "RGB"
diff --git a/SeMask-MaskFormer/configs/coco-panoptic/swin/maskformer_panoptic_swin_small_bs64_554k.yaml b/SeMask-MaskFormer/configs/coco-panoptic/swin/maskformer_panoptic_swin_small_bs64_554k.yaml
deleted file mode 100644
index 3ed3c7d..0000000
--- a/SeMask-MaskFormer/configs/coco-panoptic/swin/maskformer_panoptic_swin_small_bs64_554k.yaml
+++ /dev/null
@@ -1,32 +0,0 @@
-_BASE_: ../maskformer_panoptic_R50_bs64_554k.yaml
-MODEL:
- BACKBONE:
- NAME: "D2SwinTransformer"
- SWIN:
- EMBED_DIM: 96
- DEPTHS: [2, 2, 18, 2]
- NUM_HEADS: [3, 6, 12, 24]
- WINDOW_SIZE: 7
- APE: False
- DROP_PATH_RATE: 0.3
- PATCH_NORM: True
- WEIGHTS: "swin_small_patch4_window7_224.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
- SEM_SEG_HEAD:
- PIXEL_DECODER_NAME: "BasePixelDecoder"
- MASK_FORMER:
- TRANSFORMER_IN_FEATURE: "res5"
- ENFORCE_INPUT_PROJ: True
- TEST:
- PANOPTIC_ON: True
- OVERLAP_THRESHOLD: 0.8
- OBJECT_MASK_THRESHOLD: 0.8
-SOLVER:
- BASE_LR: 0.00006
- WARMUP_FACTOR: 1e-6
- WARMUP_ITERS: 1500
- WEIGHT_DECAY: 0.01
- WEIGHT_DECAY_NORM: 0.0
- WEIGHT_DECAY_EMBED: 0.0
- BACKBONE_MULTIPLIER: 1.0
diff --git a/SeMask-MaskFormer/configs/coco-panoptic/swin/maskformer_panoptic_swin_tiny_bs64_554k.yaml b/SeMask-MaskFormer/configs/coco-panoptic/swin/maskformer_panoptic_swin_tiny_bs64_554k.yaml
deleted file mode 100644
index 4572f15..0000000
--- a/SeMask-MaskFormer/configs/coco-panoptic/swin/maskformer_panoptic_swin_tiny_bs64_554k.yaml
+++ /dev/null
@@ -1,32 +0,0 @@
-_BASE_: ../maskformer_panoptic_R50_bs64_554k.yaml
-MODEL:
- BACKBONE:
- NAME: "D2SwinTransformer"
- SWIN:
- EMBED_DIM: 96
- DEPTHS: [2, 2, 6, 2]
- NUM_HEADS: [3, 6, 12, 24]
- WINDOW_SIZE: 7
- APE: False
- DROP_PATH_RATE: 0.3
- PATCH_NORM: True
- WEIGHTS: "swin_tiny_patch4_window7_224.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
- SEM_SEG_HEAD:
- PIXEL_DECODER_NAME: "BasePixelDecoder"
- MASK_FORMER:
- TRANSFORMER_IN_FEATURE: "res5"
- ENFORCE_INPUT_PROJ: True
- TEST:
- PANOPTIC_ON: True
- OVERLAP_THRESHOLD: 0.8
- OBJECT_MASK_THRESHOLD: 0.8
-SOLVER:
- BASE_LR: 0.00006
- WARMUP_FACTOR: 1e-6
- WARMUP_ITERS: 1500
- WEIGHT_DECAY: 0.01
- WEIGHT_DECAY_NORM: 0.0
- WEIGHT_DECAY_EMBED: 0.0
- BACKBONE_MULTIPLIER: 1.0
diff --git a/SeMask-MaskFormer/configs/coco-stuff-10k-171/Base-COCOStuff10K-171.yaml b/SeMask-MaskFormer/configs/coco-stuff-10k-171/Base-COCOStuff10K-171.yaml
deleted file mode 100644
index 3d5a2cb..0000000
--- a/SeMask-MaskFormer/configs/coco-stuff-10k-171/Base-COCOStuff10K-171.yaml
+++ /dev/null
@@ -1,59 +0,0 @@
-MODEL:
- BACKBONE:
- FREEZE_AT: 0
- NAME: "build_resnet_backbone"
- WEIGHTS: "detectron2://ImageNetPretrained/torchvision/R-50.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
- RESNETS:
- DEPTH: 50
- STEM_TYPE: "basic" # not used
- STEM_OUT_CHANNELS: 64
- STRIDE_IN_1X1: False
- OUT_FEATURES: ["res2", "res3", "res4", "res5"]
- # NORM: "SyncBN"
- RES5_MULTI_GRID: [1, 1, 1] # not used
-DATASETS:
- TRAIN: ("coco_2017_train_stuff_10k_sem_seg",)
- TEST: ("coco_2017_test_stuff_10k_sem_seg",)
-SOLVER:
- IMS_PER_BATCH: 32
- BASE_LR: 0.0001
- MAX_ITER: 60000
- WARMUP_FACTOR: 1.0
- WARMUP_ITERS: 0
- WEIGHT_DECAY: 0.0001
- OPTIMIZER: "ADAMW"
- LR_SCHEDULER_NAME: "WarmupPolyLR"
- BACKBONE_MULTIPLIER: 0.1
- CLIP_GRADIENTS:
- ENABLED: True
- CLIP_TYPE: "full_model"
- CLIP_VALUE: 0.01
- NORM_TYPE: 2.0
-INPUT:
- MIN_SIZE_TRAIN: !!python/object/apply:eval ["[int(x * 0.1 * 640) for x in range(5, 16)]"]
- MIN_SIZE_TRAIN_SAMPLING: "choice"
- MIN_SIZE_TEST: 640
- MAX_SIZE_TRAIN: 2560
- MAX_SIZE_TEST: 2560
- CROP:
- ENABLED: True
- TYPE: "absolute"
- SIZE: (640, 640)
- SINGLE_CATEGORY_MAX_AREA: 1.0
- COLOR_AUG_SSD: True
- SIZE_DIVISIBILITY: 640 # used in dataset mapper
- FORMAT: "RGB"
- DATASET_MAPPER_NAME: "mask_former_semantic"
-TEST:
- EVAL_PERIOD: 5000
- AUG:
- ENABLED: False
- MIN_SIZES: [320, 480, 640, 800, 960, 1120]
- MAX_SIZE: 4480
- FLIP: True
-DATALOADER:
- FILTER_EMPTY_ANNOTATIONS: True
- NUM_WORKERS: 4
-VERSION: 2
diff --git a/SeMask-MaskFormer/configs/coco-stuff-10k-171/maskformer_R101_bs32_60k.yaml b/SeMask-MaskFormer/configs/coco-stuff-10k-171/maskformer_R101_bs32_60k.yaml
deleted file mode 100644
index 7864b6a..0000000
--- a/SeMask-MaskFormer/configs/coco-stuff-10k-171/maskformer_R101_bs32_60k.yaml
+++ /dev/null
@@ -1,11 +0,0 @@
-_BASE_: maskformer_R50_bs32_60k.yaml
-MODEL:
- WEIGHTS: "R-101.pkl"
- RESNETS:
- DEPTH: 101
- STEM_TYPE: "basic" # not used
- STEM_OUT_CHANNELS: 64
- STRIDE_IN_1X1: False
- OUT_FEATURES: ["res2", "res3", "res4", "res5"]
- # NORM: "SyncBN"
- RES5_MULTI_GRID: [1, 1, 1] # not used
diff --git a/SeMask-MaskFormer/configs/coco-stuff-10k-171/maskformer_R101c_bs32_60k.yaml b/SeMask-MaskFormer/configs/coco-stuff-10k-171/maskformer_R101c_bs32_60k.yaml
deleted file mode 100644
index 4df030f..0000000
--- a/SeMask-MaskFormer/configs/coco-stuff-10k-171/maskformer_R101c_bs32_60k.yaml
+++ /dev/null
@@ -1,13 +0,0 @@
-_BASE_: maskformer_R50_bs32_60k.yaml
-MODEL:
- BACKBONE:
- NAME: "build_resnet_deeplab_backbone"
- WEIGHTS: "detectron2://DeepLab/R-103.pkl"
- RESNETS:
- DEPTH: 101
- STEM_TYPE: "deeplab"
- STEM_OUT_CHANNELS: 128
- STRIDE_IN_1X1: False
- OUT_FEATURES: ["res2", "res3", "res4", "res5"]
- # NORM: "SyncBN"
- RES5_MULTI_GRID: [1, 2, 4]
diff --git a/SeMask-MaskFormer/configs/coco-stuff-10k-171/maskformer_R50_bs32_60k.yaml b/SeMask-MaskFormer/configs/coco-stuff-10k-171/maskformer_R50_bs32_60k.yaml
deleted file mode 100644
index 5b737d4..0000000
--- a/SeMask-MaskFormer/configs/coco-stuff-10k-171/maskformer_R50_bs32_60k.yaml
+++ /dev/null
@@ -1,27 +0,0 @@
-_BASE_: Base-COCOStuff10K-171.yaml
-MODEL:
- META_ARCHITECTURE: "MaskFormer"
- SEM_SEG_HEAD:
- NAME: "MaskFormerHead"
- IN_FEATURES: ["res2", "res3", "res4", "res5"]
- IGNORE_VALUE: 255
- NUM_CLASSES: 171
- COMMON_STRIDE: 4 # not used, hard-coded
- LOSS_WEIGHT: 1.0
- CONVS_DIM: 256
- MASK_DIM: 256
- NORM: "GN"
- MASK_FORMER:
- TRANSFORMER_IN_FEATURE: "res5"
- DEEP_SUPERVISION: True
- NO_OBJECT_WEIGHT: 0.1
- DICE_WEIGHT: 1.0
- MASK_WEIGHT: 20.0
- HIDDEN_DIM: 256
- NUM_OBJECT_QUERIES: 100
- NHEADS: 8
- DROPOUT: 0.1
- DIM_FEEDFORWARD: 2048
- ENC_LAYERS: 0
- DEC_LAYERS: 6
- PRE_NORM: False
diff --git a/SeMask-MaskFormer/configs/coco-stuff-10k-171/per_pixel_baseline_R50_bs32_60k.yaml b/SeMask-MaskFormer/configs/coco-stuff-10k-171/per_pixel_baseline_R50_bs32_60k.yaml
deleted file mode 100644
index 4442c16..0000000
--- a/SeMask-MaskFormer/configs/coco-stuff-10k-171/per_pixel_baseline_R50_bs32_60k.yaml
+++ /dev/null
@@ -1,13 +0,0 @@
-_BASE_: Base-COCOStuff10K-171.yaml
-MODEL:
- META_ARCHITECTURE: "SemanticSegmentor"
- SEM_SEG_HEAD:
- NAME: "PerPixelBaselineHead"
- IN_FEATURES: ["res2", "res3", "res4", "res5"]
- IGNORE_VALUE: 255
- NUM_CLASSES: 171
- COMMON_STRIDE: 4 # not used, hard-coded
- LOSS_WEIGHT: 1.0
- CONVS_DIM: 256
- MASK_DIM: 256
- NORM: "GN"
diff --git a/SeMask-MaskFormer/configs/coco-stuff-10k-171/per_pixel_baseline_plus_R50_bs32_60k.yaml b/SeMask-MaskFormer/configs/coco-stuff-10k-171/per_pixel_baseline_plus_R50_bs32_60k.yaml
deleted file mode 100644
index 72f2021..0000000
--- a/SeMask-MaskFormer/configs/coco-stuff-10k-171/per_pixel_baseline_plus_R50_bs32_60k.yaml
+++ /dev/null
@@ -1,24 +0,0 @@
-_BASE_: Base-COCOStuff10K-171.yaml
-MODEL:
- META_ARCHITECTURE: "SemanticSegmentor"
- SEM_SEG_HEAD:
- NAME: "PerPixelBaselinePlusHead"
- IN_FEATURES: ["res2", "res3", "res4", "res5"]
- IGNORE_VALUE: 255
- NUM_CLASSES: 171
- COMMON_STRIDE: 4 # not used, hard-coded
- LOSS_WEIGHT: 1.0
- CONVS_DIM: 256
- MASK_DIM: 256
- NORM: "GN"
- MASK_FORMER:
- TRANSFORMER_IN_FEATURE: "res5"
- DEEP_SUPERVISION: True
- HIDDEN_DIM: 256
- NUM_OBJECT_QUERIES: 171 # remember to set this to NUM_CLASSES
- NHEADS: 8
- DROPOUT: 0.1
- DIM_FEEDFORWARD: 2048
- ENC_LAYERS: 0
- DEC_LAYERS: 6
- PRE_NORM: False
diff --git a/SeMask-MaskFormer/configs/mapillary-vistas-65/Base-MapillaryVistas-65.yaml b/SeMask-MaskFormer/configs/mapillary-vistas-65/Base-MapillaryVistas-65.yaml
deleted file mode 100644
index d3dacc3..0000000
--- a/SeMask-MaskFormer/configs/mapillary-vistas-65/Base-MapillaryVistas-65.yaml
+++ /dev/null
@@ -1,54 +0,0 @@
-MODEL:
- BACKBONE:
- FREEZE_AT: 0
- NAME: "build_resnet_backbone"
- WEIGHTS: "detectron2://ImageNetPretrained/torchvision/R-50.pkl"
- PIXEL_MEAN: [123.675, 116.280, 103.530]
- PIXEL_STD: [58.395, 57.120, 57.375]
- RESNETS:
- DEPTH: 50
- STEM_TYPE: "basic" # not used
- STEM_OUT_CHANNELS: 64
- STRIDE_IN_1X1: False
- OUT_FEATURES: ["res2", "res3", "res4", "res5"]
- # NORM: "SyncBN"
- RES5_MULTI_GRID: [1, 1, 1] # not used
-DATASETS:
- TRAIN: ("mapillary_vistas_sem_seg_train",)
- TEST: ("mapillary_vistas_sem_seg_val",)
-SOLVER:
- IMS_PER_BATCH: 16
- BASE_LR: 0.0001
- MAX_ITER: 300000
- WARMUP_FACTOR: 1.0
- WARMUP_ITERS: 0
- WEIGHT_DECAY: 0.0001
- OPTIMIZER: "ADAMW"
- LR_SCHEDULER_NAME: "WarmupPolyLR"
- BACKBONE_MULTIPLIER: 0.1
- CLIP_GRADIENTS:
- ENABLED: True
- CLIP_TYPE: "full_model"
- CLIP_VALUE: 0.01
- NORM_TYPE: 2.0
-INPUT:
- MIN_SIZE_TRAIN: !!python/object/apply:eval ["[int(x * 0.1 * 2048) for x in range(5, 21)]"]
- MIN_SIZE_TRAIN_SAMPLING: "choice"
- MIN_SIZE_TEST: 2048
- MAX_SIZE_TRAIN: 8192
- MAX_SIZE_TEST: 2048
- CROP:
- ENABLED: True
- TYPE: "absolute"
- SIZE: (1280, 1280)
- SINGLE_CATEGORY_MAX_AREA: 1.0
- COLOR_AUG_SSD: True
- SIZE_DIVISIBILITY: 1280 # used in dataset mapper
- FORMAT: "RGB"
- DATASET_MAPPER_NAME: "mask_former_semantic"
-TEST:
- EVAL_PERIOD: 5000
-DATALOADER:
- FILTER_EMPTY_ANNOTATIONS: True
- NUM_WORKERS: 10
-VERSION: 2
diff --git a/SeMask-MaskFormer/configs/mapillary-vistas-65/maskformer_R50_bs16_300k.yaml b/SeMask-MaskFormer/configs/mapillary-vistas-65/maskformer_R50_bs16_300k.yaml
deleted file mode 100644
index 1935082..0000000
--- a/SeMask-MaskFormer/configs/mapillary-vistas-65/maskformer_R50_bs16_300k.yaml
+++ /dev/null
@@ -1,27 +0,0 @@
-_BASE_: Base-MapillaryVistas-65.yaml
-MODEL:
- META_ARCHITECTURE: "MaskFormer"
- SEM_SEG_HEAD:
- NAME: "MaskFormerHead"
- IN_FEATURES: ["res2", "res3", "res4", "res5"]
- IGNORE_VALUE: 65
- NUM_CLASSES: 65
- COMMON_STRIDE: 4 # not used, hard-coded
- LOSS_WEIGHT: 1.0
- CONVS_DIM: 256
- MASK_DIM: 256
- NORM: "GN"
- MASK_FORMER:
- TRANSFORMER_IN_FEATURE: "res5"
- DEEP_SUPERVISION: True
- NO_OBJECT_WEIGHT: 0.1
- DICE_WEIGHT: 1.0
- MASK_WEIGHT: 20.0
- HIDDEN_DIM: 256
- NUM_OBJECT_QUERIES: 100
- NHEADS: 8
- DROPOUT: 0.1
- DIM_FEEDFORWARD: 2048
- ENC_LAYERS: 0
- DEC_LAYERS: 6
- PRE_NORM: False