Skip to content

[CVPR'23] FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer

Notifications You must be signed in to change notification settings

mit-han-lab/flatformer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer

Abstract

Transformer, as an alternative to CNN, has been proven effective in many modalities (e.g., texts and images). For 3D point cloud transformers, existing efforts focus primarily on pushing their accuracy to the state-of-the-art level. However, their latency lags behind sparse convolution-based models (3$\times$ slower), hindering their usage in resource-constrained, latency-sensitive applications (such as autonomous driving). This inefficiency comes from point clouds' sparse and irregular nature, whereas transformers are designed for dense, regular workloads. This paper presents FlatFormer to close this latency gap by trading spatial proximity for better computational regularity. We first flatten the point cloud with window-based sorting and partition points into groups of equal sizes rather than windows of equal shapes. This effectively avoids expensive structuring and padding overheads. We then apply self-attention within groups to extract local features, alternate sorting axis to gather features from different directions, and shift windows to exchange features across groups. FlatFormer delivers state-of-the-art accuracy on Waymo Open Dataset with 4.6x speedup over (transformer-based) SST and 1.4x speedup over (sparse convolutional) CenterPoint. This is the first point cloud transformer that achieves real-time performance on edge GPUs and is faster than sparse convolutional methods while achieving on-par or even superior accuracy on large-scale benchmarks.

Results

All the results are reproducible with this repo. Regrettably, we are unable to provide the pre-trained model weights due to Waymo Dataset License Agreement. Discussions are definitely welcome if you could not obtain satisfactory performances with FlatFormer in your projects.

3D Object Detection (on Waymo validation)

Model #Sweeps mAP/H_L1 mAP/H_L2 Veh_L1 Veh_L2 Ped_L1 Ped_L2 Cyc_L1 Cyc_L2
FlatFormer 1 76.1/73.4 69.7/67.2 77.5/77.1 69.0/68.6 79.6/73.0 71.5/65.3 71.3/70.1 68.6/67.5
FlatFormer 2 78.9/77.3 72.7/71.2 79.1/78.6 70.8/70.3 81.6/78.2 73.8/70.5 76.1/75.1 73.6/72.6
FlatFormer 3 79.6/78.0 73.5/72.0 79.7/79.2 71.4/71.0 82.0/78.7 74.5/71.3 77.2/76.1 74.7/73.7

Usage

Prerequisites

The code is built with following libraries:

After installing these dependencies, please run this command to install the codebase:

pip install -v -e .

Dataset Preparation

Please follow the instructions from MMDetection3D to download and preprocess the Waymo Open Dataset. After data preparation, you will be able to see the following directory structure (as is indicated in mmdetection3d):

mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│   ├── waymo
│   │   ├── waymo_format
│   │   │   ├── training
│   │   │   ├── validation
│   │   │   ├── testing
│   │   │   ├── gt.bin
│   │   ├── kitti_format
│   │   │   ├── ImageSets
│   │   │   ├── training
│   │   │   ├── testing
│   │   │   ├── waymo_gt_database
│   │   │   ├── waymo_infos_trainval.pkl
│   │   │   ├── waymo_infos_train.pkl
│   │   │   ├── waymo_infos_val.pkl
│   │   │   ├── waymo_infos_test.pkl
│   │   │   ├── waymo_dbinfos_train.pkl

Training

# multi-gpu training
bash tools/dist_train.sh configs/flatformer/$CONFIG.py 8 --work-dir $CONFIG/ --cfg-options evaluation.pklfile_prefix=./work_dirs/$CONFIG/results evaluation.metric=waymo

Evaluation

# multi-gpu testing
bash tools/dist_test.sh configs/flatformer/$CONFIG.py /work_dirs/$CONFIG/latest.pth 8 --eval waymo

Citation

If FlatFormer is useful or relevant to your research, please kindly recognize our contributions by citing our paper:

@inproceedings{liu2023flatformer,
  title={FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer},
  author={Liu, Zhijian and Yang, Xinyu and Tang, Haotian and Yang, Shang and Han, Song},
  booktitle={IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2023}
}

Acknowledgments

This project is based on the following codebases.

We would like to thank Tianwei Yin, Lue Fan and Ligeng Mao for providing detailed results of CenterPoint, SST/FSD and VoTr, and Yue Wang and Yukang Chen for their helpful discussions. This work was supported by National Science Foundation, MIT-IBM Watson AI Lab, NVIDIA, Hyundai and Ford. Zhijian Liu was partially supported by the Qualcomm Innovation Fellowship.

About

[CVPR'23] FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published