Skip to content

[MICRO'23, MLSys'22] TorchSparse: Efficient Training and Inference Framework for Sparse Convolution on GPUs.

License

Notifications You must be signed in to change notification settings

kingyongyongzz/torchsparse

 
 

Repository files navigation

TorchSparse

TorchSparse is a high-performance neural network library for point cloud processing.

Introduction

Point cloud computation has become an increasingly more important workload for autonomous driving and other applications. Unlike dense 2D computation, point cloud convolution has sparse and irregular computation patterns and thus requires dedicated inference system support with specialized high-performance kernels. While existing point cloud deep learning libraries have developed different dataflows for convolution on point clouds, they assume a single dataflow throughout the execution of the entire model. In this work, we systematically analyze and improve existing dataflows. Our resulting system, TorchSparse, achieves 2.9x, 3.3x, 2.2x and 1.7x measured end-to-end speedup on an NVIDIA A100 GPU over the state-of-the-art MinkowskiEngine, SpConv 1.2, TorchSparse (MLSys) and SpConv v2 in inference respectively.

News

[2023/11] TorchSparse++ has been adopted by One-2-3-45++ from Prof. Hao Su's lab (UCSD) for 3D object generation!

[2023/10] We present TorchSparse++ at 56th IEEE/ACM International Symposium on Microarchitecture (MICRO 2023). We also fully release the source code of TorchSparse++.

[2023/6] TorchSparse++ has been adopted by One-2-3-45 from Prof. Hao Su's lab (UCSD) for 3D mesh reconstruction!

[2023/6] TorchSparse++ has been released and presented at CVPR 2023 workshops on autonomous driving. It achieves 1.7-2.9x inference speedup over previous state-of-the-art systems.

[2023/1] Argoverse 2 dataset implements their baseline detector with TorchSparse.

[2022/8] TorchSparse is presented at MLSys 2022. Talk video is available here.

[2022/6] TorchSparse has been adopted by SparseNeuS for neural surface reconstruction.

[2022/1] TorchSparse has been accepted to MLSys 2022, featuring adaptive matrix multiplication grouping and locality-aware memory access.

[2021/6] TorchSparse v1.4 has been released.

Installation

We provide pre-built torchsparse v2.1.0 packages (recommended) with different PyTorch and CUDA versions to simplify the building for the Linux system.

  1. Ensure at least PyTorch 1.9.0 is installed:

    python -c "import torch; print(torch.__version__)"
    >>> 1.10.0
  2. If you want to use TorchSparse with gpus, please ensure PyTorch was installed with CUDA:

    python -c "import torch; print(torch.version.cuda)"
    >>> 11.3
  3. Then the right TorchSparse wheel can be found and installed by running the installation script:

    python -c "$(curl -fsSL https://raw.githubusercontent.com/mit-han-lab/torchsparse/master/install.py)"

If Pypi server does not work as expected, no worries, you can still manually download the wheels. The wheels are listed in this website. One can utilize our installation script to automatically determine the version number used to index the wheels. For example, if you use PyTorch 1.11.0, CUDA 11.5, the version number will end up to be 2.1.0+torch111cu115. You can then select the proper wheel according to your Python version.

You may also alternatively install our library from source via:

python setup.py install

in the repository, or using

pip install git+https://github.com/mit-han-lab/torchsparse.git

without the need to clone the repository.

Benchmarks

Inference benchmarks

eval_benchmark.png

TorchSparse significantly outperforms existing point cloud inference engines in both 3D object detection and LiDAR segmentation benchmarks across three generations of GPU architecture (Pascal, Turing and Ampere) and all precisions (FP16, TF32, FP32). It is up to 1.7x faster than state-of-the-art SpConv 2.3.5 and is up to 2.2x faster than
TorchSparse-MLsys on cloud GPUs. It also improves the latency of SpConv 2.3.5 by 1.25× on Orin.

Training benchmarks

train_benchmark.png

TorchSparse achieves superior mixed-precision training speed compared with MinkowskiEngine, TorchSparse-MLSys and SpConv 2.3.5. Specifically, it is 1.16x faster on Tesla A100, 1.27x faster on RTX 2080 Ti than state-of-the-art SpConv 2.3.5. It also significantly outperforms MinkowskiEngine by 4.6-4.8x across seven benchmarks on A100 and 2080 Ti. Measured with batch size = 2.

You may find our benchmarks from this link. To access preprocessed datasets, please contact the authors. We cannot publicly release raw data from SemanticKITTI, nuScenes and Waymo due to license requirements.

Team

TorchSparse is developed by the following wonderful team:

  • Haotian Tang: Ph.D. student (2020-) at MIT EECS, project lead, v2.0 and v2.1 lead;
  • Shang Yang: Ph.D. student (2023-) at MIT EECS, project lead, v2.1 lead;
  • Zhijian Liu: Ph.D. student (2018-) at MIT EECS, project lead, v2.0 lead;
  • Xiuyu Li: Ph.D. student (2022-) at UC Berkeley EECS, v2.0 lead;
  • Ke Hong: Graduate student (2021-) at Tsinghua University EE, v2.1 core developer, authored PCEngine kernels;
  • Zhongming Yu: Ph.D. student (2022-) at UCSD CS, v2.1 core developer, authored PCEngine kernels;
  • Yujun Lin: Ph.D. student (2018-) at MIT EECS, v2.0 core developer;
  • Yingqi Cao: Undergrad student at UC San Diego, currently working on the TorchSparse++ integration into algorithm frameworks;
  • Guohao Dai: Associate Professor at Shanghai Jiao Tong University, mentor of the project;
  • Yu Wang: Professor at Tsinghua University, mentor of the project;
  • Song Han: Associate Professor at MIT EECS, mentor of the project.

Citation

If you use TorchSparse, please use the following BibTeX entries to cite:

TorchSparse++ (TorchSparse v2.1) is presented at MICRO 2023:

@inproceedings{tangandyang2023torchsparse,  
  title={TorchSparse++: Efficient Training and Inference Framework for Sparse Convolution on GPUs},  
  author={Tang, Haotian and Yang, Shang and Liu, Zhijian and Hong, Ke and Yu, Zhongming and Li, Xiuyu and Dai, Guohao and Wang, Yu and Han, Song},  
  booktitle={IEEE/ACM International Symposium on Microarchitecture (MICRO)},  
  year={2023}
}

Preliminary version of TorchSparse++ (TorchSparse v2.1) is presented at CVPR Workshops 2023:

@inproceedings{tangandyang2023torchsparse++,
  title = {{TorchSparse++: Efficient Point Cloud Engine}},
  author = {Tang, Haotian and Yang, Shang and Liu, Zhijian and Hong, Ke and Yu, Zhongming and Li, Xiuyu and Dai, Guohao and Wang, Yu and Han, Song},
  booktitle = {Computer Vision and Pattern Recognition Workshops (CVPRW)},
  year = {2023}
}

TorchSparse is presented at MLSys 2022:

@inproceedings{tang2022torchsparse,
  title = {{TorchSparse: Efficient Point Cloud Inference Engine}},
  author = {Tang, Haotian and Liu, Zhijian and Li, Xiuyu and Lin, Yujun and Han, Song},
  booktitle = {Conference on Machine Learning and Systems (MLSys)},
  year = {2022}
}

Initial version of TorchSparse is part of the SPVNAS paper at ECCV 2020:

@inproceedings{tang2020searching,
  title = {{Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution}},
  author = {Tang, Haotian and Liu, Zhijian and Zhao, Shengyu and Lin, Yujun and Lin, Ji and Wang, Hanrui and Han, Song},
  booktitle = {European Conference on Computer Vision (ECCV)},
  year = {2020}
}

PCEngine paper is accepted by MLSys 2023:

@inproceedings{hong2023pcengine,
  title={{Exploiting Hardware Utilization and Adaptive Dataflow for Efficient Sparse Convolution in 3D Point Clouds}},
  author={Hong, Ke and Yu, Zhongming and Dai, Guohao and Yang, Xinhao and Lian, Yaoxiu and Liu, Zehao and Xu, Ningyi and Wang, Yu},
  booktitle={Sixth Conference on Machine Learning and Systems (MLSys)},
  year={2023}
}

Acknowledgement

We thank Yan Yan from TuSimple for helpful discussions. Please also have a look at the dgSparse library, which is designed for fast and efficient sparse computation on graphs and point clouds. The work from PCEngine (MLSys 2023) team is also highly related to us.

TorchSparse is inspired by many existing open-source libraries, including (but not limited to) MinkowskiEngine, SECOND and SparseConvNet.

We also thank AttributeDict for providing an elegant way to manage the kernel/model configurations.

About

[MICRO'23, MLSys'22] TorchSparse: Efficient Training and Inference Framework for Sparse Convolution on GPUs.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Cuda 77.1%
  • Python 19.8%
  • C++ 2.1%
  • C 1.0%