Skip to content

Latest commit

 

History

History
31 lines (25 loc) · 2.12 KB

PERFORMANCE_GUIDE.md

File metadata and controls

31 lines (25 loc) · 2.12 KB

Spconv 2.x Performance Guide

Short Guide

  • If you train without Tensor Core (i.e. FP32 training or FP16 training for Pascal or older GPUS), set all algo in convolution/maxpool to ConvAlgo.Native manually. Default Algorithm is ConvAlgo.MaskImplicitGemm, which is SLOWER than ConvAlgo.Native when use float32. this will be fixed in spconv 2.2.
  • If your GPU support Tensor Core, use FP16 (mixed precision training) if possible.
  • If you train with mixed precision training (use Tensor Core), you don't need to set algorithm manually.
  • Currently fast algorithm only support kernel volume (prod of kernel size) <= 32, so don't use large kernel size.
  • make sure your channel size is multiple of 8 when using fp16. multiple of 32 is better.
  • spconv 2.x in Windows 10 is 1.5x~2x slower than Linux. use Linux if possible.
  • If you train with float32 and ampere or later GPUs, you can set spconv.constants.SPCONV_ALLOW_TF32 to enable faster fp32 training. See benchmark for more performance details of different algorithms.
  • Different CUDA version of spconv may have different performance. Use newest cuda version if possible. For example, spconv-cu117 is faster than spconv-cu114, spconv-cu114 is faster than spconv-cu111.
  • if your kernel size volume larger than 32, spconv will use a slower (and more inaccurate in fp16) algorithm. to use a faster algo for large kernel size (need time to compile at runtime), use large_kernel_fast_algo=True
  • use SparseGlobalMaxPool instead of use large kernel size when you need global pool.