What is the configuration of pertraining on hypersim ? #5

sysu19351158 · 2023-02-20T06:23:22Z

Epoch, Task weighting or the other settings, can you show the training command on hypersim ?

danielS91 · 2023-02-20T09:08:19Z

For NYUv2:

python main.py \
    --tasks semantic normal scene instance orientation \
    --enable-panoptic \
    --results-basepath /some/path \
    --validation-skip 0.95 \
    --checkpointing-skip 0.95 \
    --checkpointing-metrics valid_semantic_miou bacc panoptic_deeplab_semantic_miou panoptic_all_deeplab_pq panoptic_all_with_gt_deeplab_pq \
    --rgb-encoder-backbone resnet34 \
    --rgb-encoder-backbone-block nonbottleneck1d \
    --depth-encoder-backbone resnet34 \
    --depth-encoder-backbone-block nonbottleneck1d \
    --encoder-backbone-pretrained-weights-filepath /path/to/our/imagenet/checkpoint.pth \
    --input-modalities rgb depth \
    --tasks-weighting 1.0 0.25 0.25 2.0 0.0 \
    --learning-rate 0.005 \
    --dataset hypersim \
    --subset-train 0.2 \
    --instance-center-heatmap-top-k 128

For SUNRGB-D:

python main.py \
    --tasks semantic normal scene instance orientation \
    --enable-panoptic \
    --results-basepath /some/path \
    --validation-skip 0.95 \
    --checkpointing-skip 0.95 \
    --checkpointing-metrics valid_semantic_miou bacc panoptic_deeplab_semantic_miou panoptic_all_deeplab_pq panoptic_all_with_gt_deeplab_pq \
    --rgb-encoder-backbone resnet34 \
    --rgb-encoder-backbone-block nonbottleneck1d \
    --depth-encoder-backbone resnet34 \
    --depth-encoder-backbone-block nonbottleneck1d \
    --encoder-backbone-pretrained-weights-filepath /path/to/our/imagenet/checkpoint.pth \
    --input-modalities rgb depth \
    --tasks-weighting 1.0 0.25 0.25 2.0 0.0 \
    --learning-rate 0.005 \
    --dataset hypersim \
    --subset-train 0.3 \
    --instance-center-heatmap-top-k 128

sysu19351158 · 2023-02-20T11:34:26Z

Thank you so much ! The epoch number had not been set in the command, does this mean that the number of epoch is 500, as set in the args.py ？

danielS91 · 2023-02-20T19:53:39Z

Yes. However, note that the actual number of iterations also depends on the specified subset parameter. Even with a random subset of 0.2 or 0.3 per epoch, training on an A100 will take around one week.

sysu19351158 · 2023-02-25T07:30:48Z

Thank you ! 🙏 But there is an another problem. That is when I train EMSANet on nyuv2 with the the pretrained weights for the encoder backbone ResNet-34 NBt1D, using the command in the last of Readme file, the test miou is 0.5041. It is different from the paper——0.5097, though I repeated the training process three times. Did I do something wrong ?

danielS91 · 2023-04-20T14:30:59Z

This should not happen. I will run a test training to double-check this.

danielS91 · 2023-04-21T19:14:59Z

Ok, I did some test trainings and was able to almost reproduce the reported results in a more recent environment:

                                       task: ['semantic', 'scene', 'instance', 'orientation']
                             task_weighting: [1.0, 0.25, 3.0, 1.0]
                         instance_weighting: [2, 1]
                                         lr: 0.03
                                      wandb: EMSANet-nyuv2-r34nbt1d-testruns astral-firefly-6 (2tnzlo26)
                                  wandb_url: https://wandb.ai/nicr/EMSANet-nyuv2-r34nbt1d-testruns/runs/2tnzlo26
                                  epoch_max: 499

valid_panoptic_all_with_gt_deeplab_pq (447)
      valid_instance_all_with_gt_deeplab_pq: 0.6060
               valid_orientation_mae_gt_deg: 18.4523
              valid_panoptic_all_deeplab_pq: 0.4324
      valid_panoptic_all_with_gt_deeplab_pq: 0.4324
      valid_panoptic_all_with_gt_deeplab_rq: 0.5183
      valid_panoptic_all_with_gt_deeplab_sq: 0.8253
       valid_panoptic_deeplab_semantic_miou: 0.5123
             valid_panoptic_mae_deeplab_deg: 16.1432
                           valid_scene_bacc: 0.7684
                        valid_semantic_miou: 0.5083

Note that the learning rate is slightly lower than the reported value in the paper: 0.04 (paper) vs 0.03 (here). However, as the environment is different, I enqueued runs with 0.02, 0.03, and 0.04. The best result is shown above. It was at epoch 447 based on valid_panoptic_all_with_gt_deeplab_pq.

Training was done on an A100 40GB with driver 470.63.01. Please find below additional details on the environment.

conda list | grep -e torch -e cuda
cuda                      11.7.1                        0    nvidia
cuda-cccl                 11.7.91                       0    nvidia
cuda-command-line-tools   11.7.1                        0    nvidia
cuda-compiler             11.7.1                        0    nvidia
cuda-cudart               11.7.99                       0    nvidia
cuda-cudart-dev           11.7.99                       0    nvidia
cuda-cuobjdump            11.7.91                       0    nvidia
cuda-cupti                11.7.101                      0    nvidia
cuda-cuxxfilt             11.7.91                       0    nvidia
cuda-demo-suite           11.8.86                       0    nvidia
cuda-documentation        11.8.86                       0    nvidia
cuda-driver-dev           11.7.99                       0    nvidia
cuda-gdb                  11.8.86                       0    nvidia
cuda-libraries            11.7.1                        0    nvidia
cuda-libraries-dev        11.7.1                        0    nvidia
cuda-memcheck             11.8.86                       0    nvidia
cuda-nsight               11.8.86                       0    nvidia
cuda-nsight-compute       11.8.0                        0    nvidia
cuda-nvcc                 11.7.99                       0    nvidia
cuda-nvdisasm             11.8.86                       0    nvidia
cuda-nvml-dev             11.7.91                       0    nvidia
cuda-nvprof               11.8.87                       0    nvidia
cuda-nvprune              11.7.91                       0    nvidia
cuda-nvrtc                11.7.99                       0    nvidia
cuda-nvrtc-dev            11.7.99                       0    nvidia
cuda-nvtx                 11.7.91                       0    nvidia
cuda-nvvp                 11.8.87                       0    nvidia
cuda-runtime              11.7.1                        0    nvidia
cuda-sanitizer-api        11.8.86                       0    nvidia
cuda-toolkit              11.7.1                        0    nvidia
cuda-tools                11.7.1                        0    nvidia
cuda-visual-tools         11.7.1                        0    nvidia
cudatoolkit               11.3.1               h2bc3f7f_2
ffmpeg                    4.3                  hf484d3e_0    pytorch
pytorch                   1.13.0          py3.8_cuda11.7_cudnn8.5.0_0    pytorch
pytorch-cuda              11.7                 h67b0de4_0    pytorch
pytorch-lightning         1.5.8                    pypi_0    pypi
pytorch-mutex             1.0                        cuda    pytorch
torchaudio                0.13.0               py38_cu117    pytorch
torchmetrics              0.10.2                   pypi_0    pypi
torchvision               0.14.0               py38_cu117    pytorch

I hope this helps.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the configuration of pertraining on hypersim ? #5

What is the configuration of pertraining on hypersim ? #5

sysu19351158 commented Feb 20, 2023

danielS91 commented Feb 20, 2023

sysu19351158 commented Feb 20, 2023

danielS91 commented Feb 20, 2023

sysu19351158 commented Feb 25, 2023

danielS91 commented Apr 20, 2023

danielS91 commented Apr 21, 2023

What is the configuration of pertraining on hypersim ? #5

What is the configuration of pertraining on hypersim ? #5

Comments

sysu19351158 commented Feb 20, 2023

danielS91 commented Feb 20, 2023

sysu19351158 commented Feb 20, 2023

danielS91 commented Feb 20, 2023

sysu19351158 commented Feb 25, 2023

danielS91 commented Apr 20, 2023

danielS91 commented Apr 21, 2023