OpenPCDet Toolbox for LiDAR-based 3D Object Detection.


My Own Readme

Explaining different components of OpenPCDet based on its design pattern



Based on the image:

0- datasets

a. DataProcessor(object): transform_points_to_voxels using VoxelGeneratorV2 class: important output = batch_dict['voxels'].

  • How does transform_points_to_voxels work?
  • grid_size = (self.point_cloud_range[3:6] - self.point_cloud_range[0:3]) / np.array(config.VOXEL_SIZE)
  • points_to_voxel's outputs:
    • voxels: [M, max_points, ndim] float tensor. only contain points.
    • coordinates: [M, 3] int32 tensor. zyx format.
    • num_points_per_voxel: [M] int32 tensor.

1- VFE step: MeanVFE().forward: Compute mean of points in each voxel-> voxel_features

  • Visualization example: before vs after

2- 3D SparseConv step: VoxelBackBone8x: takes voxel_features and voxel_coords from step 1. It applies series of sparse conv and returns encoded_spconv_tensor:

  • Visualize each step.
  • What does exactly spareconv do? This link might be helpful.
   # for detection head
        # [200, 176, 5] -> [200, 176, 2]
        out = self.conv_out(x_conv4)
            'encoded_spconv_tensor': out,
            'encoded_spconv_tensor_stride': 8
            'multi_scale_3d_features': {
                'x_conv1': x_conv1,
                'x_conv2': x_conv2,
                'x_conv3': x_conv3,
                'x_conv4': x_conv4,
  • some questions:
    • what are SubMConv3d and SparseConv3d?

3- Map to BEV: HeightCompression:takes batch_dict['encoded_spconv_tensor'] from step 2 and return spatial_features (It stacks 3D feature volume along Z axis).

encoded_spconv_tensor = batch_dict['encoded_spconv_tensor']
spatial_features = encoded_spconv_tensor.dense()
N, C, D, H, W = spatial_features.shape
spatial_features = spatial_features.view(N, C * D, H, W)
batch_dict['spatial_features'] = spatial_features
batch_dict['spatial_features_stride'] = batch_dict['encoded_spconv_tensor_stride']

4- VSA step: Voxel Set Abstraction(VSA): VoxelSetAbstraction(nn.Module):

a. Sample points from raw point cloud using FPS -> keypoints

b. Stores interpolated bev features at keypoints. point_bev_features = self.interpolate_from_bev_features using keypoints and batch_dict['spatial_features'] (why?) It estimates the feature map value for the given keypoints (x,y) pairs

c. It applies set abstraction module on top of rawpoints using keypoints and stores these new features. There is a pooling module at the end.

  • StackSAModuleMSG(nn.Module): similar to class PointnetSAModuleMSG class in pointnet2 code. there is a pooling here. It takes rawpoints and keypoints then it applies QueryAndGroup, mlps and pooling n times. It returns new_xyz, new_features. (new_xyz: sampled points or keypoints.)(how does pooling work?) simply applying max or avg over nsample dimension.
  • check this link for understanding this module

d. StackSAModuleMSG(nn.Module): this time it takes multi_scale_3d_features and applies QueryAndGroup, mlps and pooling n times then stores these features.

e. Concat the resutls from b, c and d.

f. Applies self.vsa_point_feature_fusion (a linear, bn and relu) on the features from prev step.

g. It Returns features from step e and f.

      batch_dict['point_features_before_fusion'] = point_features.view(-1, point_features.shape[-1])
      point_features = self.vsa_point_feature_fusion(point_features.view(-1, point_features.shape[-1]))
      batch_dict['point_features'] = point_features  # (BxN, C)
      batch_dict['point_coords'] = point_coords  # (BxN, 4)

5- Reshape to BEV step: BaseBEVBackbone(nn.Module):

a. It takes spatial features from step number 3 then applies some conv+bn+relu and conv transpose+bn+relu. It return these features as data_dict['spatial_features_2d'].

6- RPN Head (DenseHead) step: AnchorHeadSingle(AnchorHeadTemplate):

  (cls_loss_func): SigmoidFocalClassificationLoss()
  (reg_loss_func): WeightedSmoothL1Loss()
  (dir_loss_func): WeightedCrossEntropyLoss()
  (conv_cls): Conv2d(512, 18, kernel_size=(1, 1), stride=(1, 1))  ## 18 = 6 anchors per location x 3 classes
  (conv_box): Conv2d(512, 42, kernel_size=(1, 1), stride=(1, 1))  ## 42 = 6 anchors x 7: x y z l w h theta
  (conv_dir_cls): Conv2d(512, 12, kernel_size=(1, 1), stride=(1, 1)) ## 6 anchors x 2 NUM_DIR_BINS

a. It takes spatial_features_2d from step 5 and produces cls, dir, and box predictions b. It returns

            data_dict['batch_cls_preds'] = batch_cls_preds
            data_dict['batch_box_preds'] = batch_box_preds
            data_dict['cls_preds_normalized'] = False

7- Point Head (DenseHead) step: PointHeadSimple(PointHeadTemplate):

a. It takes point_features_before_fusion or point_features from step 4 and produces classification scores.

8- ROI Head step: PVRCNNHead(RoIHeadTemplate):

  (proposal_target_layer): ProposalTargetLayer()
  (reg_loss_func): WeightedSmoothL1Loss()
  (roi_grid_pool_layer): StackSAModuleMSG(
    (groupers): ModuleList(
      (0): QueryAndGroup()
      (1): QueryAndGroup()
    (mlps): ModuleList(....)
  (shared_fc_layer): Sequential(...)
  (cls_layers): Sequential(...)
  (reg_layers): Sequential(...)

a. It applies proposal layer:

b. Then it applies roi_grid_pool_layer:

c. Finally it applies cls_layers and reg_layers.

Inside PVRCNNHead: RoIHeadTemplate-> proposal_layer(self, batch_dict, nms_config):

- Its task is to run nms on the proposals. It is class agnostics and multi class version is not implemented.

Inside PVRCNNHead: RoIHeadTemplate-> ProposalTargetLayer(nn.Module):

- Its task is ...

Backbone 3D

Related classes:

  1. Point Feature Encoding (PFE). VoxelSetAbstraction(nn.Module):
  • It contains StackSAModuleMSG(nn.Module) modules.
  1. Voxel Feature Encoding (VFE).

    a. VFETemplate(nn.Module)

    b. MeanVFE(VFETemplate)

    c. PillarVFE(VFETemplate)

  2. VoxelBackBone8x(nn.Module)

Backbone 2D

Related classes:

  1. Map to BEV: HeightCompression(nn.Module)
  2. Map to BEV: PointPillarScatter(nn.Module)
  3. BaseBEVBackbone(nn.Module)

Dense Head

Related classes:

  1. AnchorHeadTemplate(nn.Module)
  • init(self, model_cfg, num_class, class_names, grid_size, point_cloud_range, predict_boxes_when_training)

    Example for model_cfg for pvrcnn: {'NAME': 'AnchorHeadSingle', 'CLASS_AGNOSTIC': False, 'USE_DIRECTION_CLASSIFIER': True, 'DIR_OFFSET': 0.78539, 'DIR_LIMIT_OFFSET': 0.0, 'NUM_DIR_BINS': 2,
    'ANCHOR_GENERATOR_CONFIG': [{'class_name': 'car', 'anchor_sizes': [[4.2, 2.0, 1.6]], 'anchor_rotations': [0, 1.57], 'anchor_bottom_heights': [0],
    'align_center': False, 'feature_map_stride': 8, 'matched_threshold': 0.55, 'unmatched_threshold': 0.4}], 'TARGET_ASSIGNER_CONFIG': {'NAME': 'AxisAlignedTargetAssigner', 'POS_FRACTION': -1.0, 'SAMPLE_SIZE': 512, 'NORM_BY_NUM_EXAMPLES': False, 'MATCH_HEIGHT': False, 'BOX_CODER': 'ResidualCoder'}, 'LOSS_CONFIG': {'LOSS_WEIGHTS': {'cls_weight': 1.0, 'loc_weight': 2.0, 'dir_weight': 0.2, 'code_weights': [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]}} } # End of example

    Example for predict_boxes_when_training for pvrcnn: {'NAME': 'PVRCNNHead', 'CLASS_AGNOSTIC': True, 'SHARED_FC': [256, 256], 'CLS_FC': [256, 256], 'REG_FC': [256, 256], 'DP_RATIO': 0.3, ' NMS_CONFIG': {'TRAIN': {'NMS_TYPE': 'nms_gpu', 'MULTI_CLASSES_NMS': False, 'NMS_PRE_MAXSIZE': 9000, 'NMS_POST_MAXSIZE': 512, 'NMS_THRESH': 0.8}, 'TEST': {'NMS_TYPE': 'nms_gpu', 'MULTI_CLASSES_NMS': False, 'NMS_PRE_MAXSIZE': 1024, 'NMS_POST_MAXSIZE': 100, 'NMS_THRESH': 0.7}},
    'ROI_GRID_POOL': {'GRID_SIZE': 6, 'MLPS': [[64, 64], [64, 64]], 'POOL_RADIUS': [0.8, 1.6], 'NSAMPLE': [16, 16], 'POOL_METHOD': 'max_pool'}, 'TARGET_CONFIG': {'BOX_CODER': 'ResidualCoder', 'ROI_PER_IMAGE': 128, 'FG_RATIO': 0.5, 'SAMPLE_ROI_BY_EACH_CLASS': True, 'CLS_SCORE_TYPE': 'raw_roi_iou',
    'CLS_FG_THRESH': 0.75, 'CLS_BG_THRESH': 0.25, 'CLS_BG_THRESH_LO': 0.1, 'HARD_BG_RATIO': 0.8, 'REG_FG_THRESH': 0.55}, 'LOSS_CONFIG': {'CLS_LOSS': 'BinaryCrossEntropy', 'REG_LOSS': 'smooth-l1', 'CORNER_LOSS_REGULARIZATION': True, 'LOSS_WEIGHTS': {'rcnn_cls_weight': 1.0, 'rcnn_reg_weight': 1.0, 'rcnn_corner_weight': 1.0, 'code_weights': [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]}} } # End of example


    a. Generate anchors

    b. Create target_assigner (AxisAlignedTargetAssigner or ATSSTargetAssigner-> Reference:

    c. build_losses

  1. AnchorHeadSingle(AnchorHeadTemplate)

  2. PointIntraPartOffsetHead(PointHeadTemplate):

  3. PointHeadSimple(PointHeadTemplate): A simple point-based segmentation head, which are used for PV-RCNN keypoint segmentaion. Reference Paper: PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection.

  4. AnchorHeadMulti(AnchorHeadTemplate)

  5. PointHeadTemplate(nn.Module):


    a. build_losses(self.model_cfg.LOSS_CONFIG)

ROI Head

Related classes:

  1. RoIHeadTemplate(nn.Module)


    a. Create a target assigner or in fact ProposalTargetLayer(roi_sampler_cfg=self.model_cfg.TARGET_CONFIG)

    b. build_losses(self.model_cfg.LOSS_CONFIG)

  2. PartA2FCHead(RoIHeadTemplate)

  3. PVRCNNHead(RoIHeadTemplate)


    a. self.roi_grid_pool_layer = pointnet2_stack_modules.StackSAModuleMSG( radii=self.model_cfg.ROI_GRID_POOL.POOL_RADIUS, nsamples=self.model_cfg.ROI_GRID_POOL.NSAMPLE, mlps=mlps, use_xyz=True, pool_method=self.model_cfg.ROI_GRID_POOL.POOL_METHOD, )

  4. SECONDHead(RoIHeadTemplate)

Other modules

PointNet2 modules

**Related Classes: **

  1. StackSAModuleMSG(nn.Module)
  • init(self, *, radii: List[float], nsamples: List[int], mlps: List[List[int]], use_xyz: bool = True, pool_method='max_pool')

PointNet2 Utils:

  1. BallQuery(Function)
  2. GroupingOperation(Function)
  3. QueryAndGroup(nn.Module)
  4. FurthestPointSampling(Function)

Original Readme by the authors of OpenPCDet


OpenPCDet is a clear, simple, self-contained open source project for LiDAR-based 3D object detection.

It is also the official code release of [PointRCNN], [Part-A2-Net], [PV-RCNN] and [Voxel R-CNN].

NEW: OpenPCDet has been updated to v0.5.0 (Dec. 2021).



[2021-12-01] NEW: OpenPCDet v0.5.0 is released with the following features:

  • Improve the performance of all models on Waymo Open Dataset. Note that you need to re-prepare the training/validation data and ground-truth database of Waymo Open Dataset (see
  • Support anchor-free CenterHead, add configs of CenterPoint and PV-RCNN with CenterHead.
  • Support lastest PyTorch 1.1~1.10 and spconv 1.0~2.x, where spconv 2.x should be easy to install with pip and faster than previous version (see the official update of spconv here).
  • Support config USE_SHARED_MEMORY to use shared memory to potentially speed up the training process in case you suffer from an IO problem.
  • Support better and faster visualization script, and you need to install Open3D firstly.

[2021-06-08] Added support for the voxel-based 3D object detection model Voxel R-CNN

[2021-05-14] Added support for the monocular 3D object detection model CaDDN

[2020-11-27] Bugfixed: Please re-prepare the validation infos of Waymo dataset (version 1.2) if you would like to use our provided Waymo evaluation tool (see PR). Note that you do not need to re-prepare the training data and ground-truth database.

[2020-11-10] The Waymo Open Dataset has been supported with state-of-the-art results. Currently we provide the configs and results of SECOND, PartA2 and PV-RCNN on the Waymo Open Dataset, and more models could be easily supported by modifying their dataset configs.

[2020-08-10] Bugfixed: The provided NuScenes models have been updated to fix the loading bugs. Please redownload it if you need to use the pretrained NuScenes models.

[2020-07-30] OpenPCDet v0.3.0 is released with the following features:

[2020-07-17] Add simple visualization codes and a quick demo to test with custom data.

[2020-06-24] OpenPCDet v0.2.0 is released with pretty new structures to support more models and datasets.

[2020-03-16] OpenPCDet v0.1.0 is released.


What does OpenPCDet toolbox do?

Note that we have upgrated PCDet from v0.1 to v0.2 with pretty new structures to support various datasets and models.

OpenPCDet is a general PyTorch-based codebase for 3D object detection from point cloud. It currently supports multiple state-of-the-art 3D object detection methods with highly refactored codes for both one-stage and two-stage 3D detection frameworks.

Based on OpenPCDet toolbox, we win the Waymo Open Dataset challenge in 3D Detection, 3D Tracking, Domain Adaptation three tracks among all LiDAR-only methods, and the Waymo related models will be released to OpenPCDet soon.

We are actively updating this repo currently, and more datasets and models will be supported soon. Contributions are also welcomed.

OpenPCDet design pattern

  • Data-Model separation with unified point cloud coordinate for easily extending to custom datasets:

  • Unified 3D box definition: (x, y, z, dx, dy, dz, heading).

  • Flexible and clear model structure to easily support various 3D detection models:

  • Support various models within one framework as:

Currently Supported Features

  • Support both one-stage and two-stage 3D object detection frameworks
  • Support distributed training & testing with multiple GPUs and multiple machines
  • Support multiple heads on different scales to detect different classes
  • Support stacked version set abstraction to encode various number of points in different scenes
  • Support Adaptive Training Sample Selection (ATSS) for target assignment
  • Support RoI-aware point cloud pooling & RoI-grid point cloud pooling
  • Support GPU version 3D IoU calculation and rotated NMS

Model Zoo

KITTI 3D Object Detection Baselines

Selected supported methods are shown in the below table. The results are the 3D detection performance of moderate difficulty on the val set of KITTI dataset.

  • All models are trained with 8 GTX 1080Ti GPUs and are available for download.
  • The training time is measured with 8 TITAN XP GPUs and PyTorch 1.5.
training time Car@R11 Pedestrian@R11 Cyclist@R11 download
PointPillar ~1.2 hours 77.28 52.29 62.68 model-18M
SECOND ~1.7 hours 78.62 52.98 67.15 model-20M
SECOND-IoU - 79.09 55.74 71.31 model-46M
PointRCNN ~3 hours 78.70 54.41 72.11 model-16M
PointRCNN-IoU ~3 hours 78.75 58.32 71.34 model-16M
Part-A2-Free ~3.8 hours 78.72 65.99 74.29 model-226M
Part-A2-Anchor ~4.3 hours 79.40 60.05 69.90 model-244M
PV-RCNN ~5 hours 83.61 57.90 70.47 model-50M
Voxel R-CNN (Car) ~2.2 hours 84.54 - - model-28M
CaDDN (Mono) ~15 hours 21.38 13.02 9.76 model-774M

Waymo Open Dataset Baselines

We provide the setting of DATA_CONFIG.SAMPLED_INTERVAL on the Waymo Open Dataset (WOD) to subsample partial samples for training and evaluation, so you could also play with WOD by setting a smaller DATA_CONFIG.SAMPLED_INTERVAL even if you only have limited GPU resources.

By default, all models are trained with 20% data (~32k frames) of all the training samples on 8 GTX 1080Ti GPUs, and the results of each cell here are mAP/mAPH calculated by the official Waymo evaluation metrics on the whole validation set (version 1.2).

Performance@(train with 20% Data) Vec_L1 Vec_L2 Ped_L1 Ped_L2 Cyc_L1 Cyc_L2
SECOND 70.96/70.34 62.58/62.02 65.23/54.24 57.22/47.49 57.13/55.62 54.97/53.53
CenterPoint 71.33/70.76 63.16/62.65 72.09/65.49 64.27/58.23 68.68/67.39 66.11/64.87
CenterPoint (ResNet) 72.76/72.23 64.91/64.42 74.19/67.96 66.03/60.34 71.04/69.79 68.49/67.28
Part-A2-Anchor 74.66/74.12 65.82/65.32 71.71/62.24 62.46/54.06 66.53/65.18 64.05/62.75
PV-RCNN (AnchorHead) 75.41/74.74 67.44/66.80 71.98/61.24 63.70/53.95 65.88/64.25 63.39/61.82
PV-RCNN (CenterHead) 75.95/75.43 68.02/67.54 75.94/69.40 67.66/61.62 70.18/68.98 67.73/66.57

We could not provide the above pretrained models due to Waymo Dataset License Agreement, but you could easily achieve similar performance by training with the default configs.

NuScenes 3D Object Detection Baselines

All models are trained with 8 GTX 1080Ti GPUs and are available for download.

mATE mASE mAOE mAVE mAAE mAP NDS download
PointPillar-MultiHead 33.87 26.00 32.07 28.74 20.15 44.63 58.23 model-23M
SECOND-MultiHead (CBGS) 31.15 25.51 26.64 26.26 20.46 50.59 62.29 model-35M

Other datasets

Welcome to support other datasets by submitting pull request.


Please refer to for the installation of OpenPCDet.

Quick Demo

Please refer to for a quick demo to test with a pretrained model and visualize the predicted results on your custom data or the original KITTI data.

Getting Started

Please refer to to learn more usage about this project.


OpenPCDet is released under the Apache 2.0 license.


OpenPCDet is an open source project for LiDAR-based 3D scene perception that supports multiple LiDAR-based perception models as shown above. Some parts of PCDet are learned from the official released codes of the above supported methods. We would like to thank for their proposed methods and the official implementation.

We hope that this repo could serve as a strong and flexible codebase to benefit the research community by speeding up the process of reimplementing previous works and/or developing new methods.


If you find this project useful in your research, please consider cite:

    title={OpenPCDet: An Open-source Toolbox for 3D Object Detection from Point Clouds},
    author={OpenPCDet Development Team},
    howpublished = {\url{}},


Welcome to be a member of the OpenPCDet development team by contributing to this repo, and feel free to contact us for any potential contributions.


