Skip to content

AsymFormer: Asymmetrical Cross-Modal Representation Learning for Mobile Platform Real-Time RGB-D Semantic Segmentation (Submitted to ICRA 2024)

License

Notifications You must be signed in to change notification settings

ShengjunTang/AsymFormer

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AsymFormer: Asymmetrical Cross-Modal Representation Learning for Mobile Platform Real-Time RGB-D Semantic Segmentation (Submitted to ICRA 2024)

PWC PWC PWC Apache 2.0 License

This repository contains the official implementation of AsymFormer, a novel network for real-time RGB-D semantic segmentation.

  • Achieves efficient and precise RGB-D semantic segmentation
  • Allows effective fusion of multimodal features at low computational cost
  • Minimizes superfluous parameters by optimizing computational resource distribution
  • Enhances network accuracy through feature selection and multi-modal self-similarity features
  • Utilizes Local Attention-Guided Feature Selection (LAFS) module for selective fusion
  • Introduces Cross-Modal Attention-Guided Feature Correlation Embedding (CMA) module for cross-modal representations

Results

AsymFormer achieves competitive results on the following datasets:

  • NYUv2: 52.0% mIoU
  • SUNRGBD: 49.1% mIoU

Notably, it also provides impressive inference speeds:

  • Inference speed of 65 FPS on RTX3090
  • Inference speed of 79 FPS on RTX3090 (FP16)
  • Inference speed of 29 FPS on Tesla T4 (FP16)

Installation

To run this project, we suggest using Ubuntu 20.04, PyTorch 2.0.1, and CUDA version higher than 12.0.

Other necessary package for running the evaluation and TensorRT FP16 quantization inference:

pip install timm
pip install scikit-image
pip install opencv-python-headless==4.5.5.64
pip install thop
pip install onnx
pip install oonnxruntime
pip install tensorrt==8.6.0
pip install pycuda

Data Preparation

We used the same data source as the ACNet. The processed NYUv2 data (.npy) can be downloaded by Google Drive.

Usage

Currently, we have provided ONNX model and TensorRT FP16 model for evaluation and inference.

FP16 Inference (RTX3090 Platform)

The TensorRT inference notebook can be found in Folder. You can test AsymFormer on your local environment by:

  • Downlaod the folder 'Inference'
  • Downlaod the TensorRT FP 16 model, which generated and optimized for RTX 3090 platform. [AsymFormer FP16 TensorRT Model]
  • Download the NYUv2 Dataset NYUv2.
  • Put the 'AsymFormer.engine' in the 'Inference' folder.
  • Modify the dataset path to your own path ↓
val_data = Data.RGBD_Dataset(transform=torchvision.transforms.Compose([scaleNorm(),
                                                                       ToTensor(),
                                                                       Normalize()]),
                             phase_train=False,
                             data_dir='Your Own Path', ← The file path of the NYUv2 dataset
                             txt_name='test.txt'    
                             )
  • Run the Jupyter Notebook

Optimize the AsymFormer for your own platform

You can generate your own TensorRT engine from the ONNX model. We provide the original ONNX model and corresponding notebook to help you genrate the TensorRT model

  • The ONNX model is exported on v17 operation, and it can be downloaded from [AsymFormer ONNX Model]
  • The jupyter notebook contains loading ONNX model, checking numeric overflow and generating mixed-precision TensorRT model, which can be downloaded from Generate TensorRT.

Training

The The souce code of AsymFormer will be released soon.

License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.

Acknowledgements

If you find this repository useful in your research, please consider citing:

@misc{du2023asymformer,
      title={AsymFormer: Asymmetrical Cross-Modal Representation Learning for Mobile Platform Real-Time RGB-D Semantic Segmentation}, 
      author={Siqi Du and Weixi Wang and Renzhong Guo and Shengjun Tang},
      year={2023},
      eprint={2309.14065},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Contact

For any inquiries, please contact [email protected]. Home page of the author: Siqi.DU's ResearchGate

About

AsymFormer: Asymmetrical Cross-Modal Representation Learning for Mobile Platform Real-Time RGB-D Semantic Segmentation (Submitted to ICRA 2024)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages

  • Jupyter Notebook 54.9%
  • Python 45.1%