Skip to content

This project aims to provide modular implementation and an easy pipeline of training and evaluation for SOTA semantic segmentation models.

License

Notifications You must be signed in to change notification settings

BebDong/MXNetSeg

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MXNetSeg

This project provides modular implementation for state-of-the-art semantic segmentation models based on the MXNet framework and GluonCV toolkit. See MindSeg for a mirror implemented by the HUAWEI MindSpore.

Bright Spots

  • Ease of use and extension pipeline for the semantic segmentation task, including data pre-processing, model definition, network training and evaluation.

  • Parallel training on GPUs.

  • Multiple supported models.

    • Fully Convolutional Networks for Semantic Segmentation [FCN, CVPR2015, paper]
    • Attention to Scale: Scale-Aware Semantic Image Segmentation [Att2Scale, CVPR2016, paper]
    • Rethinking Atrous Convolution for Semantic Image Segmentation [DeepLabv3, arXiv2017, paper]
    • Ladder-Style DenseNets for Semantic Segmentation of Large Natural Images [LadderDensenet, ICCVW2017, paper]
    • Pyramid Scene Parsing Network [PSPNet, CVPR2017, paper]
    • BiSeNet: Bilateral segmentation network for real-time semantic segmentation [BiSeNet, ECCV2018, paper]
    • Encoder-decoder with atrous separable convolution for semantic image segmentation [DeepLabv3+, ECCV2018, paper]
    • DenseASPP for Semantic Segmentation in Street Scenes [DenseASPP, CVPR2018, paper]
    • Towards Bridging Semantic Gap to Improve Semantic Segmentation [SeENet, ICCV2019, paper]
    • ACFNet: Attentional Class Feature Network for Semantic Segmentation [ACFNet, ICCV2019, paper]
    • Dual Attention Network for Scene Segmentation [DANet, CVPR2019, paper]
    • In Defense of Pre-trained ImageNet Architectures for Real-time Semantic Segmentation of Road-driving Images [SwiftNet, CVPR2019, paper]
    • Panoptic Feature Pyramid Networks [SemanticFPN, CVPR2019, paper]
    • Gated Fully Fusion for Semantic Segmentation [GFFNet, AAAI2020, paper]
    • Attention-guided Chained Context Aggregation for Semantic Segmentation [CANetv1, IMAVIS2021, paper]
    • EPRNet: Efficient Pyramid Representation Network for Real-Time Street Scene Segmentation [EPRNet, TITS2021, paper]
    • AttaNet: Attention-Augmented Network for Fast and Accurate Scene Parsing [AttaNet, AAAI2021, paper]
    • An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale [ViT, ICLR2021, paper]
    • Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers [SETR, CVPR2021, paper]
    • FaPN: Feature-aligned Pyramid Network for Dense Image Prediction [FaPN, ICCV2021, paper]
    • AlignSeg: Feature-Aligned Segmentation Networks [AlignSeg, TPAMI2021, paper]
    • Compensating for Local Ambiguity with Encoder-Decoder in Urban Scene Segmentation [CANetv2, TITS2022, paper]

Benchmarks

We note that:

  • OS is output stride of the backbone network.
  • * denotes multi-scale and flipping testing, otherwise single-scale inputs.
  • No whistles and bells are adopted, e.g. OHEM or multi-grid.

Cityscapes

Model Backbone OS #Params TrainSet EvalSet mIoU *mIoU
BiSeNet ResNet18 32 13.2M train_fine val 71.6 74.7
BiSeNet ResNet18 32 13.2M trainval_fine test - 74.8
FCN ResNet18 32 12.4M train_fine val 64.9 68.1
FCN ResNet18 8 12.4M train_fine val 68.3 69.9
FCN ResNet50 8 28.4M train_fine val 71.7 -
FCN ResNet101 8 47.5M train_fine val 74.5 -
PSPNet ResNet101 8 56.4M train_fine val 78.2 79.5
DeepLabv3 ResNet101 8 58.9M train_fine val 79.3 80.0
DenseASPP ResNet101 8 69.4M train_fine val 78.7 79.8
DANet ResNet101 8 66.7M train_fine val 79.7 80.9

ADE20K

Model Backbone OS TrainSet EvalSet PA mIoU *PA *mIoU
PSPNet ResNet101 8 train val 80.1 42.9 80.9 43.7

Pascal VOC 2012

Model Backbone OS TrainSet EvalSet PA mIoU *PA *mIoU
FCN ResNet101 8 train_aug val 94.4 74.6 94.5 75.0
Att2Scale ResNet101 8 train_aug val 94.8 77.1 - -
PSPNet ResNet101 8 train_aug val 95.1 78.1 95.3 78.5
DeepLabv3 ResNet101 8 train_aug val 95.5 80.1 95.6 80.4
DeepLabv3+ ResNet101 8 train_aug val 95.5 79.9 95.6 80.1

NYUv2

Model Backbone OS TrainSet EvalSet PA mIoU *PA *mIoU
FCN ResNet101 8 train val 69.2 39.7 70.2 41.0
PSPNet ResNet101 8 train val 71.3 43.0 71.9 43.6
DeepLabv3+ ResNet101 8 train val 73.5 46.0 74.3 47.2

Environment

We adopt python 3.6.2 and CUDA 10.1 in this project.

  1. Prerequisites

    pip install -r requirements.txt

    Note that we employ wandb for log and visualization. Refer to here for a QuickStart.

  2. Detail API for Pascal Context dataset

Usage

Training

  1. Configure hyper-parameters in ./mxnetseg/config.yml

  2. Run the ./mxnetseg/train.py script

    python train.py --ctx 0 1 2 3 --wandb wandb-demo
  3. During training, the program will automatically create a sub-folder ./weights/{model_name} to save model checkpoints/parameters.

Inference

Simply run the ./mxnetseg/eval.py with arguments need to be specified

python eval.py --model FCNResNet --backbone resnet18 --checkpoint fcn_resnet18_Cityscapes_20191900_310600_best.params --ctx 0 --data Cityscapes --crop 768 --base 2048 --mode val --ms

About the mode:

  • val: to get mIoU and PA metrics on the validation set.
  • test: to get colored predictions on the test set.
  • testval: to get colored predictions on the validation set.

Citations

Please kindly cite our paper if you feel our codes help in your research.

@article{tang2021attention,
  title={Attention-guided chained context aggregation for semantic segmentation},
  author={Tang, Quan and Liu, Fagui and Zhang, Tong and Jiang, Jun and Zhang, Yu},
  journal={Image and Vision Computing},
  pages={104309},
  year={2021},
  publisher={Elsevier}
}

@article{tang2021eprnet,
  title={EPRNet: Efficient Pyramid Representation Network for Real-Time Street Scene Segmentation},
  author={Tang, Quan and Liu, Fagui and Jiang, Jun and Zhang, Yu},
  journal={IEEE Transactions on Intelligent Transportation Systems},
  year={2021},
  doi={10.1109/TITS.2021.3066401},
  publisher={IEEE}
}

@article{tang2022compe,
  title={Compensating for Local Ambiguity With Encoder-Decoder in Urban Scene Segmentation}, 
  author={Tang, Quan and Liu, Fagui and Zhang, Tong and Jiang, Jun and Zhang, Yu and Zhu, Boyuan and Tang, Xuhao},
  journal={IEEE Transactions on Intelligent Transportation Systems},
  year={2022},
  doi={10.1109/TITS.2022.3157128},
  publisher={IEEE}
}

About

This project aims to provide modular implementation and an easy pipeline of training and evaluation for SOTA semantic segmentation models.

Resources

License

Stars

Watchers

Forks

Languages