Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AssertionError when idx.shape[1] == k #79

Closed
jackie174 opened this issue Nov 4, 2022 · 23 comments
Closed

AssertionError when idx.shape[1] == k #79

jackie174 opened this issue Nov 4, 2022 · 23 comments

Comments

@jackie174
Copy link

Hello Xumin, I got this problem, any suggestions?
bash ./scripts/train.sh 0
--config ./cfgs/KITTI_models/PoinTr.yaml
--exp_name example
/content/pointr
/content/pointr

  • GPUS=0
  • PY_ARGS='--config ./cfgs/KITTI_models/PoinTr.yaml --exp_name example'
  • CUDA_VISIBLE_DEVICES=0
  • python main.py --config ./cfgs/KITTI_models/PoinTr.yaml --exp_name example
    2022-11-04 04:37:20,520 - PoinTr - INFO - Copy the Config file from ./cfgs/KITTI_models/PoinTr.yaml to ./experiments/PoinTr/KITTI_models/example/config.yaml
    2022-11-04 04:37:20,520 - PoinTr - INFO - args.config : ./cfgs/KITTI_models/PoinTr.yaml
    2022-11-04 04:37:20,520 - PoinTr - INFO - args.launcher : none
    2022-11-04 04:37:20,520 - PoinTr - INFO - args.local_rank : 0
    2022-11-04 04:37:20,520 - PoinTr - INFO - args.num_workers : 4
    2022-11-04 04:37:20,520 - PoinTr - INFO - args.seed : 0
    2022-11-04 04:37:20,521 - PoinTr - INFO - args.deterministic : False
    2022-11-04 04:37:20,521 - PoinTr - INFO - args.sync_bn : False
    2022-11-04 04:37:20,521 - PoinTr - INFO - args.exp_name : example
    2022-11-04 04:37:20,521 - PoinTr - INFO - args.start_ckpts : None
    2022-11-04 04:37:20,521 - PoinTr - INFO - args.ckpts : None
    2022-11-04 04:37:20,521 - PoinTr - INFO - args.val_freq : 1
    2022-11-04 04:37:20,521 - PoinTr - INFO - args.resume : False
    2022-11-04 04:37:20,521 - PoinTr - INFO - args.test : False
    2022-11-04 04:37:20,521 - PoinTr - INFO - args.mode : None
    2022-11-04 04:37:20,521 - PoinTr - INFO - args.experiment_path : ./experiments/PoinTr/KITTI_models/example
    2022-11-04 04:37:20,521 - PoinTr - INFO - args.tfboard_path : ./experiments/PoinTr/KITTI_models/TFBoard/example
    2022-11-04 04:37:20,521 - PoinTr - INFO - args.log_name : PoinTr
    2022-11-04 04:37:20,521 - PoinTr - INFO - args.use_gpu : True
    2022-11-04 04:37:20,521 - PoinTr - INFO - args.distributed : False
    2022-11-04 04:37:20,521 - PoinTr - INFO - config.optimizer = edict()
    2022-11-04 04:37:20,521 - PoinTr - INFO - config.optimizer.type : AdamW
    2022-11-04 04:37:20,521 - PoinTr - INFO - config.optimizer.kwargs = edict()
    2022-11-04 04:37:20,521 - PoinTr - INFO - config.optimizer.kwargs.lr : 0.0001
    2022-11-04 04:37:20,522 - PoinTr - INFO - config.optimizer.kwargs.weight_decay : 0.0005
    2022-11-04 04:37:20,522 - PoinTr - INFO - config.scheduler = edict()
    2022-11-04 04:37:20,522 - PoinTr - INFO - config.scheduler.type : LambdaLR
    2022-11-04 04:37:20,522 - PoinTr - INFO - config.scheduler.kwargs = edict()
    2022-11-04 04:37:20,522 - PoinTr - INFO - config.scheduler.kwargs.decay_step : 21
    2022-11-04 04:37:20,522 - PoinTr - INFO - config.scheduler.kwargs.lr_decay : 0.9
    2022-11-04 04:37:20,522 - PoinTr - INFO - config.scheduler.kwargs.lowest_decay : 0.02
    2022-11-04 04:37:20,522 - PoinTr - INFO - config.bnmscheduler = edict()
    2022-11-04 04:37:20,522 - PoinTr - INFO - config.bnmscheduler.type : Lambda
    2022-11-04 04:37:20,522 - PoinTr - INFO - config.bnmscheduler.kwargs = edict()
    2022-11-04 04:37:20,522 - PoinTr - INFO - config.bnmscheduler.kwargs.decay_step : 21
    2022-11-04 04:37:20,522 - PoinTr - INFO - config.bnmscheduler.kwargs.bn_decay : 0.5
    2022-11-04 04:37:20,522 - PoinTr - INFO - config.bnmscheduler.kwargs.bn_momentum : 0.9
    2022-11-04 04:37:20,522 - PoinTr - INFO - config.bnmscheduler.kwargs.lowest_decay : 0.01
    2022-11-04 04:37:20,522 - PoinTr - INFO - config.dataset = edict()
    2022-11-04 04:37:20,522 - PoinTr - INFO - config.dataset.train = edict()
    2022-11-04 04:37:20,522 - PoinTr - INFO - config.dataset.train.base = edict()
    2022-11-04 04:37:20,522 - PoinTr - INFO - config.dataset.train.base.NAME : PCN
    2022-11-04 04:37:20,522 - PoinTr - INFO - config.dataset.train.base.CATEGORY_FILE_PATH : data/PCN/PCN.json
    2022-11-04 04:37:20,523 - PoinTr - INFO - config.dataset.train.base.N_POINTS : 16384
    2022-11-04 04:37:20,523 - PoinTr - INFO - config.dataset.train.base.N_RENDERINGS : 8
    2022-11-04 04:37:20,523 - PoinTr - INFO - config.dataset.train.base.PARTIAL_POINTS_PATH : data/PCN/%s/partial/%s/%s/%02d.pcd
    2022-11-04 04:37:20,523 - PoinTr - INFO - config.dataset.train.base.COMPLETE_POINTS_PATH : data/PCN/%s/complete/%s/%s.pcd
    2022-11-04 04:37:20,523 - PoinTr - INFO - config.dataset.train.base.CARS : True
    2022-11-04 04:37:20,523 - PoinTr - INFO - config.dataset.train.others = edict()
    2022-11-04 04:37:20,523 - PoinTr - INFO - config.dataset.train.others.subset : train
    2022-11-04 04:37:20,523 - PoinTr - INFO - config.dataset.train.others.bs : 64
    2022-11-04 04:37:20,523 - PoinTr - INFO - config.dataset.val = edict()
    2022-11-04 04:37:20,523 - PoinTr - INFO - config.dataset.val.base = edict()
    2022-11-04 04:37:20,523 - PoinTr - INFO - config.dataset.val.base.NAME : PCN
    2022-11-04 04:37:20,523 - PoinTr - INFO - config.dataset.val.base.CATEGORY_FILE_PATH : data/PCN/PCN.json
    2022-11-04 04:37:20,523 - PoinTr - INFO - config.dataset.val.base.N_POINTS : 16384
    2022-11-04 04:37:20,523 - PoinTr - INFO - config.dataset.val.base.N_RENDERINGS : 8
    2022-11-04 04:37:20,523 - PoinTr - INFO - config.dataset.val.base.PARTIAL_POINTS_PATH : data/PCN/%s/partial/%s/%s/%02d.pcd
    2022-11-04 04:37:20,523 - PoinTr - INFO - config.dataset.val.base.COMPLETE_POINTS_PATH : data/PCN/%s/complete/%s/%s.pcd
    2022-11-04 04:37:20,523 - PoinTr - INFO - config.dataset.val.base.CARS : True
    2022-11-04 04:37:20,523 - PoinTr - INFO - config.dataset.val.others = edict()
    2022-11-04 04:37:20,523 - PoinTr - INFO - config.dataset.val.others.subset : test
    2022-11-04 04:37:20,524 - PoinTr - INFO - config.dataset.test = edict()
    2022-11-04 04:37:20,524 - PoinTr - INFO - config.dataset.test.base = edict()
    2022-11-04 04:37:20,524 - PoinTr - INFO - config.dataset.test.base.NAME : KITTI
    2022-11-04 04:37:20,524 - PoinTr - INFO - config.dataset.test.base.CATEGORY_FILE_PATH : data/KITTI/KITTI.json
    2022-11-04 04:37:20,524 - PoinTr - INFO - config.dataset.test.base.N_POINTS : 16384
    2022-11-04 04:37:20,524 - PoinTr - INFO - config.dataset.test.base.CLOUD_PATH : data/KITTI/cars/%s.pcd
    2022-11-04 04:37:20,524 - PoinTr - INFO - config.dataset.test.base.BBOX_PATH : data/KITTI/bboxes/%s.txt
    2022-11-04 04:37:20,524 - PoinTr - INFO - config.dataset.test.others = edict()
    2022-11-04 04:37:20,524 - PoinTr - INFO - config.dataset.test.others.subset : test
    2022-11-04 04:37:20,524 - PoinTr - INFO - config.model = edict()
    2022-11-04 04:37:20,524 - PoinTr - INFO - config.model.NAME : PoinTr
    2022-11-04 04:37:20,524 - PoinTr - INFO - config.model.num_pred : 14336
    2022-11-04 04:37:20,524 - PoinTr - INFO - config.model.num_query : 224
    2022-11-04 04:37:20,524 - PoinTr - INFO - config.model.knn_layer : 1
    2022-11-04 04:37:20,524 - PoinTr - INFO - config.model.trans_dim : 384
    2022-11-04 04:37:20,524 - PoinTr - INFO - config.total_bs : 64
    2022-11-04 04:37:20,525 - PoinTr - INFO - config.step_per_update : 1
    2022-11-04 04:37:20,525 - PoinTr - INFO - config.max_epoch : 600
    2022-11-04 04:37:20,525 - PoinTr - INFO - config.consider_metric : CDL1
    2022-11-04 04:37:20,525 - PoinTr - INFO - Distributed training: False
    2022-11-04 04:37:20,525 - PoinTr - INFO - Set random seed to 0, deterministic: False
    2022-11-04 04:37:20,534 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=02958343, Name=car]
    2022-11-04 04:37:20,563 - PCNDATASET - INFO - Complete collecting files of the dataset. Total files: 5677
    /usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py:477: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
    cpuset_checked))
    2022-11-04 04:37:20,570 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=02958343, Name=car]
    2022-11-04 04:37:20,570 - PCNDATASET - INFO - Complete collecting files of the dataset. Total files: 150
    2022-11-04 04:37:20,571 - MODEL - INFO - Transformer with knn_layer 1
    2022-11-04 04:37:31,629 - PoinTr - INFO - Using Data parallel ...
    2022-11-04 04:37:35,690 - PoinTr - INFO - padding while KITTI training
    Traceback (most recent call last):
    File "main.py", line 68, in
    main()
    File "main.py", line 64, in main
    run_net(args, config, train_writer, val_writer)
    File "/content/pointr/tools/runner.py", line 98, in run_net
    ret = base_model(partial)
    File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
    File "/usr/local/lib/python3.7/dist-packages/torch/nn/parallel/data_parallel.py", line 165, in forward
    return self.module(*inputs[0], **kwargs[0])
    File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
    File "/content/pointr/models/PoinTr.py", line 92, in forward
    q, coarse_point_cloud = self.base_model(xyz) # B M C and B M 3
    File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
    File "/content/pointr/models/Transformer.py", line 353, in forward
    coor, f = self.grouper(inpc.transpose(1,2).contiguous())
    File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
    File "/content/pointr/models/dgcnn_group.py", line 87, in forward
    f = self.get_graph_feature(coor, f, coor, f)
    File "/content/pointr/models/dgcnn_group.py", line 67, in get_graph_feature
    assert idx.shape[1] == k
    AssertionError
@yuxumin
Copy link
Owner

yuxumin commented Nov 4, 2022

Hi,
Have you modified the code? this error comes from the unexpected k value of knn operation.

@jackie174
Copy link
Author

jackie174 commented Nov 4, 2022

Hello, thanks so much for your reply.
I do not modify the code.
One thing I found is wired.
When I use bash ./scripts/train.sh 0 \ --config ./cfgs/KITTI_models/PoinTr.yaml \ --exp_name example
I first get an error is ./data/PCN/train is not found, Then I download them,
After that, I get the error
assert idx. shape[1] == k
AssertionError
Then I print idx.shape[1] result is 3

During the above process, I even do not download the KITTI dataset. Why did the code require to use PCN?

@yuxumin
Copy link
Owner

yuxumin commented Nov 4, 2022

alright, i got the problem.
This is probably due to the version of knn_cuda . It does return idx (shape B 3 k, but expected to be B k 3).
so please transpose the idx before this line and the problem will go away

@jackie174
Copy link
Author

Hi, I try transposing data, but I get more errors.
First, I get view error:

  File "/content/pointr/models/dgcnn_group.py", line 73, in get_graph_feature
    idx = idx.view(-1)
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

Then I change it to reshape, and get the error below:
I am so confused, I did not change any code instead of transposing and reshaping.

            _, idx = knn(coor_k, coor_q)  # bs k np
            print("------------Before transpose\n", type(idx), idx)
            idx= torch.transpose(idx, 0, 1)
            print("-------------After transpose\n", type(idx), idx)
            assert idx.shape[1] == k
            idx_base = torch.arange(0, batch_size, device=x_q.device).view(-1, 1, 1) * num_points_k
            idx = idx + idx_base
            idx = idx.reshape(-1)

The error I got:


+ GPUS=0
+ PY_ARGS='--ckpts ./pretrained/PoinTr_PCN.pth --config ./cfgs/PCN_models/PoinTr.yaml --exp_name example'
+ CUDA_VISIBLE_DEVICES=0
+ python main.py --test --ckpts ./pretrained/PoinTr_PCN.pth --config ./cfgs/PCN_models/PoinTr.yaml --exp_name example
Create experiment path successfully at ./experiments/PoinTr/PCN_models/test_example
Create TFBoard path successfully at ./experiments/PoinTr/PCN_models/TFBoard/test_example
2022-11-13 00:47:49,404 - PoinTr - INFO - Copy the Config file from ./cfgs/PCN_models/PoinTr.yaml to ./experiments/PoinTr/PCN_models/test_example/config.yaml
2022-11-13 00:47:49,404 - PoinTr - INFO - args.config : ./cfgs/PCN_models/PoinTr.yaml
2022-11-13 00:47:49,404 - PoinTr - INFO - args.launcher : none
2022-11-13 00:47:49,404 - PoinTr - INFO - args.local_rank : 0
2022-11-13 00:47:49,404 - PoinTr - INFO - args.num_workers : 4
2022-11-13 00:47:49,404 - PoinTr - INFO - args.seed : 0
2022-11-13 00:47:49,404 - PoinTr - INFO - args.deterministic : False
2022-11-13 00:47:49,404 - PoinTr - INFO - args.sync_bn : False
2022-11-13 00:47:49,404 - PoinTr - INFO - args.exp_name : test_example
2022-11-13 00:47:49,404 - PoinTr - INFO - args.start_ckpts : None
2022-11-13 00:47:49,405 - PoinTr - INFO - args.ckpts : ./pretrained/PoinTr_PCN.pth
2022-11-13 00:47:49,405 - PoinTr - INFO - args.val_freq : 1
2022-11-13 00:47:49,405 - PoinTr - INFO - args.resume : False
2022-11-13 00:47:49,405 - PoinTr - INFO - args.test : True
2022-11-13 00:47:49,405 - PoinTr - INFO - args.mode : None
2022-11-13 00:47:49,405 - PoinTr - INFO - args.experiment_path : ./experiments/PoinTr/PCN_models/test_example
2022-11-13 00:47:49,405 - PoinTr - INFO - args.tfboard_path : ./experiments/PoinTr/PCN_models/TFBoard/test_example
2022-11-13 00:47:49,405 - PoinTr - INFO - args.log_name : PoinTr
2022-11-13 00:47:49,405 - PoinTr - INFO - args.use_gpu : True
2022-11-13 00:47:49,405 - PoinTr - INFO - args.distributed : False
2022-11-13 00:47:49,405 - PoinTr - INFO - config.optimizer = edict()
2022-11-13 00:47:49,405 - PoinTr - INFO - config.optimizer.type : AdamW
2022-11-13 00:47:49,405 - PoinTr - INFO - config.optimizer.kwargs = edict()
2022-11-13 00:47:49,405 - PoinTr - INFO - config.optimizer.kwargs.lr : 0.0005
2022-11-13 00:47:49,405 - PoinTr - INFO - config.optimizer.kwargs.weight_decay : 0.0005
2022-11-13 00:47:49,405 - PoinTr - INFO - config.scheduler = edict()
2022-11-13 00:47:49,405 - PoinTr - INFO - config.scheduler.type : LambdaLR
2022-11-13 00:47:49,406 - PoinTr - INFO - config.scheduler.kwargs = edict()
2022-11-13 00:47:49,406 - PoinTr - INFO - config.scheduler.kwargs.decay_step : 21
2022-11-13 00:47:49,406 - PoinTr - INFO - config.scheduler.kwargs.lr_decay : 0.9
2022-11-13 00:47:49,406 - PoinTr - INFO - config.scheduler.kwargs.lowest_decay : 0.02
2022-11-13 00:47:49,406 - PoinTr - INFO - config.bnmscheduler = edict()
2022-11-13 00:47:49,406 - PoinTr - INFO - config.bnmscheduler.type : Lambda
2022-11-13 00:47:49,406 - PoinTr - INFO - config.bnmscheduler.kwargs = edict()
2022-11-13 00:47:49,406 - PoinTr - INFO - config.bnmscheduler.kwargs.decay_step : 21
2022-11-13 00:47:49,406 - PoinTr - INFO - config.bnmscheduler.kwargs.bn_decay : 0.5
2022-11-13 00:47:49,406 - PoinTr - INFO - config.bnmscheduler.kwargs.bn_momentum : 0.9
2022-11-13 00:47:49,406 - PoinTr - INFO - config.bnmscheduler.kwargs.lowest_decay : 0.01
2022-11-13 00:47:49,406 - PoinTr - INFO - config.dataset = edict()
2022-11-13 00:47:49,406 - PoinTr - INFO - config.dataset.train = edict()
2022-11-13 00:47:49,406 - PoinTr - INFO - config.dataset.train._base_ = edict()
2022-11-13 00:47:49,406 - PoinTr - INFO - config.dataset.train._base_.NAME : PCN
2022-11-13 00:47:49,406 - PoinTr - INFO - config.dataset.train._base_.CATEGORY_FILE_PATH : data/PCN/PCN.json
2022-11-13 00:47:49,406 - PoinTr - INFO - config.dataset.train._base_.N_POINTS : 16384
2022-11-13 00:47:49,406 - PoinTr - INFO - config.dataset.train._base_.N_RENDERINGS : 8
2022-11-13 00:47:49,407 - PoinTr - INFO - config.dataset.train._base_.PARTIAL_POINTS_PATH : data/PCN/%s/partial/%s/%s/%02d.pcd
2022-11-13 00:47:49,407 - PoinTr - INFO - config.dataset.train._base_.COMPLETE_POINTS_PATH : data/PCN/%s/complete/%s/%s.pcd
2022-11-13 00:47:49,407 - PoinTr - INFO - config.dataset.train._base_.CARS : False
2022-11-13 00:47:49,407 - PoinTr - INFO - config.dataset.train.others = edict()
2022-11-13 00:47:49,407 - PoinTr - INFO - config.dataset.train.others.subset : train
2022-11-13 00:47:49,407 - PoinTr - INFO - config.dataset.train.others.bs : 48
2022-11-13 00:47:49,407 - PoinTr - INFO - config.dataset.val = edict()
2022-11-13 00:47:49,407 - PoinTr - INFO - config.dataset.val._base_ = edict()
2022-11-13 00:47:49,407 - PoinTr - INFO - config.dataset.val._base_.NAME : PCN
2022-11-13 00:47:49,407 - PoinTr - INFO - config.dataset.val._base_.CATEGORY_FILE_PATH : data/PCN/PCN.json
2022-11-13 00:47:49,407 - PoinTr - INFO - config.dataset.val._base_.N_POINTS : 16384
2022-11-13 00:47:49,407 - PoinTr - INFO - config.dataset.val._base_.N_RENDERINGS : 8
2022-11-13 00:47:49,407 - PoinTr - INFO - config.dataset.val._base_.PARTIAL_POINTS_PATH : data/PCN/%s/partial/%s/%s/%02d.pcd
2022-11-13 00:47:49,407 - PoinTr - INFO - config.dataset.val._base_.COMPLETE_POINTS_PATH : data/PCN/%s/complete/%s/%s.pcd
2022-11-13 00:47:49,407 - PoinTr - INFO - config.dataset.val._base_.CARS : False
2022-11-13 00:47:49,407 - PoinTr - INFO - config.dataset.val.others = edict()
2022-11-13 00:47:49,407 - PoinTr - INFO - config.dataset.val.others.subset : test
2022-11-13 00:47:49,407 - PoinTr - INFO - config.dataset.test = edict()
2022-11-13 00:47:49,407 - PoinTr - INFO - config.dataset.test._base_ = edict()
2022-11-13 00:47:49,408 - PoinTr - INFO - config.dataset.test._base_.NAME : PCN
2022-11-13 00:47:49,408 - PoinTr - INFO - config.dataset.test._base_.CATEGORY_FILE_PATH : data/PCN/PCN.json
2022-11-13 00:47:49,408 - PoinTr - INFO - config.dataset.test._base_.N_POINTS : 16384
2022-11-13 00:47:49,408 - PoinTr - INFO - config.dataset.test._base_.N_RENDERINGS : 8
2022-11-13 00:47:49,408 - PoinTr - INFO - config.dataset.test._base_.PARTIAL_POINTS_PATH : data/PCN/%s/partial/%s/%s/%02d.pcd
2022-11-13 00:47:49,408 - PoinTr - INFO - config.dataset.test._base_.COMPLETE_POINTS_PATH : data/PCN/%s/complete/%s/%s.pcd
2022-11-13 00:47:49,408 - PoinTr - INFO - config.dataset.test._base_.CARS : False
2022-11-13 00:47:49,408 - PoinTr - INFO - config.dataset.test.others = edict()
2022-11-13 00:47:49,408 - PoinTr - INFO - config.dataset.test.others.subset : test
2022-11-13 00:47:49,408 - PoinTr - INFO - config.model = edict()
2022-11-13 00:47:49,408 - PoinTr - INFO - config.model.NAME : PoinTr
2022-11-13 00:47:49,408 - PoinTr - INFO - config.model.num_pred : 14336
2022-11-13 00:47:49,408 - PoinTr - INFO - config.model.num_query : 224
2022-11-13 00:47:49,408 - PoinTr - INFO - config.model.knn_layer : 1
2022-11-13 00:47:49,408 - PoinTr - INFO - config.model.trans_dim : 384
2022-11-13 00:47:49,408 - PoinTr - INFO - config.total_bs : 48
2022-11-13 00:47:49,408 - PoinTr - INFO - config.step_per_update : 1
2022-11-13 00:47:49,408 - PoinTr - INFO - config.max_epoch : 300
2022-11-13 00:47:49,408 - PoinTr - INFO - config.consider_metric : CDL1
2022-11-13 00:47:49,409 - PoinTr - INFO - Distributed training: False
2022-11-13 00:47:49,409 - PoinTr - INFO - Set random seed to 0, deterministic: False
2022-11-13 00:47:49,409 - PoinTr - INFO - Tester start ... 
2022-11-13 00:47:49,416 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=02691156, Name=airplane]
2022-11-13 00:47:49,417 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=02933112, Name=cabinet]
2022-11-13 00:47:49,417 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=02958343, Name=car]
2022-11-13 00:47:49,418 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=03001627, Name=chair]
2022-11-13 00:47:49,418 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=03636649, Name=lamp]
2022-11-13 00:47:49,420 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=04256520, Name=sofa]
2022-11-13 00:47:49,420 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=04379243, Name=table]
2022-11-13 00:47:49,421 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=04530566, Name=watercraft]
2022-11-13 00:47:49,421 - PCNDATASET - INFO - Complete collecting files of the dataset. Total files: 1200
/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py:566: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
  cpuset_checked))
2022-11-13 00:47:49,423 - MODEL - INFO -  Transformer with knn_layer 1
2022-11-13 00:47:51,820 - PoinTr - INFO - Loading weights from ./pretrained/PoinTr_PCN.pth...
2022-11-13 00:47:54,342 - PoinTr - INFO - ckpts @ 289 epoch( performance = No Metrics)
------------Before transpose
 <class 'torch.Tensor'> tensor([[ 0,  1,  2],
        [ 3,  3,  3],
        [ 4,  4,  4],
        [ 5,  5,  5],
        [ 6,  6,  6],
        [ 7,  7,  7],
        [ 8,  8,  8],
        [ 9,  9,  9],
        [10, 10, 10],
        [11, 11, 11],
        [12, 12, 12],
        [13, 13, 13],
        [14, 14, 14],
        [15, 15, 15],
        [ 2,  2,  1],
        [ 1,  0,  0]], device='cuda:0')
-------------After transpose
 <class 'torch.Tensor'> tensor([[ 0,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15,  2,  1],
        [ 1,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15,  2,  0],
        [ 2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15,  1,  0]],
       device='cuda:0')
Traceback (most recent call last):
  File "main.py", line 68, in <module>
    main()
  File "main.py", line 62, in main
    test_net(args, config)
  File "/content/pointr/tools/runner.py", line 304, in test_net
    test(base_model, test_dataloader, ChamferDisL1, ChamferDisL2, args, config, logger=logger)
  File "/content/pointr/tools/runner.py", line 326, in test
    ret = base_model(partial)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/content/pointr/models/PoinTr.py", line 92, in forward
    q, coarse_point_cloud = self.base_model(xyz) # B M C and B M 3
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/content/pointr/models/Transformer.py", line 353, in forward
    coor, f = self.grouper(inpc.transpose(1,2).contiguous()) 
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/content/pointr/models/dgcnn_group.py", line 90, in forward
    f = self.get_graph_feature(coor, f, coor, f)
  File "/content/pointr/models/dgcnn_group.py", line 77, in get_graph_feature
    feature = feature.view(batch_size, k, num_points_q, num_dims).permute(0, 3, 2, 1).contiguous()
RuntimeError: shape '[1, 16, 2048, 8]' is invalid for input of size 384

@yuxumin
Copy link
Owner

yuxumin commented Nov 13, 2022

_, idx = knn(coor_k, coor_q) # bs k np

can you show me the shape of coor_k and coor_q?
It seems that the shape of idx in your situation is (k, 3)? but it should be a 3-dims vector.

@jackie174
Copy link
Author

jackie174 commented Nov 14, 2022

Thanks so much for your reply!
The below is what I get when I do evaluation:

!bash ./scripts/test.sh 0 \
    --ckpts ./pretrained/PoinTr_PCN.pth \
    --config ./cfgs/PCN_models/PoinTr.yaml \
    --exp_name example

Then I get following:

shpae of coor_k:  torch.Size([1, 3, 2048])
shpae of coor_q:  torch.Size([1, 3, 2048])
idx before transpose: tensor([[ 0,  1,  2],
        [ 3,  3,  3],
        [ 4,  4,  4],
        [ 5,  5,  5],
        [ 6,  6,  6],
        [ 7,  7,  7],
        [ 8,  8,  8],
        [ 9,  9,  9],
        [10, 10, 10],
        [11, 11, 11],
        [12, 12, 12],
        [13, 13, 13],
        [14, 14, 14],
        [15, 15, 15],
        [ 2,  2,  1],
        [ 1,  0,  0]], device='cuda:0')
idx after transpose: tensor([[ 0,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15,  2,  1],
        [ 1,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15,  2,  0],
        [ 2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15,  1,  0]],
       device='cuda:0')

The below is what I do in tranning:

bash ./scripts/train.sh 0 \
    --config ./cfgs/KITTI_models/PoinTr.yaml \
    --exp_name example

The below I get:

shpae of coor_k:  torch.Size([64, 3, 2048])
shpae of coor_q:  torch.Size([64, 3, 2048])
idx before transpose: tensor([[ 0,  1,  2],
        [15, 10,  4],
        [11, 14,  8],
        [14,  5, 13],
        [10, 15, 10],
        [13,  6, 15],
        [ 4, 11, 11],
        [ 6, 13,  6],
        [ 5,  3, 12],
        [ 7,  4,  5],
        [ 3,  7, 14],
        [12, 12,  3],
        [ 9,  9,  7],
        [ 8,  8,  9],
        [ 1,  2,  1],
        [ 2,  0,  0]], device='cuda:0')
idx after transpose: tensor([[ 0, 15, 11, 14, 10, 13,  4,  6,  5,  7,  3, 12,  9,  8,  1,  2],
        [ 1, 10, 14,  5, 15,  6, 11, 13,  3,  4,  7, 12,  9,  8,  2,  0],
        [ 2,  4,  8, 13, 10, 15, 11,  6, 12,  5, 14,  3,  7,  9,  1,  0]],
       device='cuda:0')

@jackie174
Copy link
Author

jackie174 commented Nov 19, 2022

One thing is weird,
For PCN_models, ShapeNet34_models, and ShapeNet55_models, they can work on PCN.yaml.
For GRNet.yaml, it will output a RuntimeError: CUDA out of memory.
However, they are both not working on PoinTr.yaml.
I alway get assert idx.shape[1] == k
I do not modify code.


These are works ;

!bash ./scripts/train.sh 0 \
    --config ./cfgs/PCN_models/PCN.yaml \
    --exp_name example
!bash ./scripts/train.sh 0 \
    --config ./cfgs/ShapeNet55_models/PCN.yaml \
    --exp_name example

These are not works and give error : RuntimeError: CUDA out of memory.

!bash ./scripts/train.sh 0 \
    --config ./cfgs/PCN_models/GRNet.yaml \
    --exp_name example

These are not works and give error : AssertionErro: rassert idx.shape[1] == k
I try PCN_models, ShapeNet34_models, and ShapeNet55_models, not works

!bash ./scripts/train.sh 0 \
    --config ./cfgs/ShapeNet55_models/PoinTr.yaml \
    --exp_name example

@yuxumin
Copy link
Owner

yuxumin commented Nov 19, 2022

hi, the problem comes from knn_cuda used in your environment.
Can you provide your env by running conda env list.
And can you share with me your models/dgcnn_group.py ?

@jackie174
Copy link
Author

jackie174 commented Nov 20, 2022

Thank u very much for your reply.
My environment is :
cuda: 11.2
pytorch:1.13.0+cu117
python: 3.7
gcc: 7.5

conda env list

> # conda environments:
> #
> base                     /usr/local

 import torch
print(torch.__version__)
nvcc --version
gcc -v

> 1.13.0+cu117
> nvcc: NVIDIA (R) Cuda compiler driver
> Copyright (c) 2005-2021 NVIDIA Corporation
> Built on Sun_Feb_14_21:12:58_PST_2021
> Cuda compilation tools, release 11.2, V11.2.152
> Build cuda_11.2.r11.2/compiler.29618528_0
> Using built-in specs.
> COLLECT_GCC=gcc
> COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/7/lto-wrapper
> OFFLOAD_TARGET_NAMES=nvptx-none
> OFFLOAD_TARGET_DEFAULT=1
> Target: x86_64-linux-gnu
> Configured with: ../src/configure -v --with-pkgversion='Ubuntu 7.5.0-3ubuntu1~18.04' --with-bugurl=file:https:///usr/share/doc/gcc-7/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-7 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
> Thread model: posix
> gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04) 

The code I just fork from your repo.
https://github.com/Cmput-414/PoinTr/blob/master/models/dgcnn_group.py
You also can take a look at what I write in the colab.
https://github.com/Cmput-414/pointr-colab/blob/main/PoinTr.ipynb

@yuxumin
Copy link
Owner

yuxumin commented Nov 21, 2022

Hi, i update the code for kNN calculation in dgcnn_group.py, could you try again and update the results here?

If the error still exists, please debug and let me know the shape of input and output for kNN. (coor_k, coor_q, idx)

Best!

@jackie174
Copy link
Author

jackie174 commented Nov 21, 2022

HI, I get this after running:
shape of coor_q: torch.Size([48, 3, 2048])
shape of coor_k: torch.Size([48, 3, 2048])
shape of idx:
Before transpose: torch.Size([48, 2048, 16])
After transpose: torch.Size([48, 16, 2048])
after veiw: torch.Size([1572864])

/content/pointr
+ GPUS=0
+ PY_ARGS='--config ./cfgs/PCN_models/PoinTr.yaml --exp_name example'
+ CUDA_VISIBLE_DEVICES=0
+ python main.py --config ./cfgs/PCN_models/PoinTr.yaml --exp_name example
2022-11-21 07:18:16,859 - PoinTr - INFO - Copy the Config file from ./cfgs/PCN_models/PoinTr.yaml to ./experiments/PoinTr/PCN_models/example/config.yaml
2022-11-21 07:18:16,860 - PoinTr - INFO - args.config : ./cfgs/PCN_models/PoinTr.yaml
2022-11-21 07:18:16,860 - PoinTr - INFO - args.launcher : none
2022-11-21 07:18:16,860 - PoinTr - INFO - args.local_rank : 0
2022-11-21 07:18:16,860 - PoinTr - INFO - args.num_workers : 4
2022-11-21 07:18:16,860 - PoinTr - INFO - args.seed : 0
2022-11-21 07:18:16,860 - PoinTr - INFO - args.deterministic : False
2022-11-21 07:18:16,860 - PoinTr - INFO - args.sync_bn : False
2022-11-21 07:18:16,860 - PoinTr - INFO - args.exp_name : example
2022-11-21 07:18:16,860 - PoinTr - INFO - args.start_ckpts : None
2022-11-21 07:18:16,860 - PoinTr - INFO - args.ckpts : None
2022-11-21 07:18:16,860 - PoinTr - INFO - args.val_freq : 1
2022-11-21 07:18:16,860 - PoinTr - INFO - args.resume : False
2022-11-21 07:18:16,860 - PoinTr - INFO - args.test : False
2022-11-21 07:18:16,860 - PoinTr - INFO - args.mode : None
2022-11-21 07:18:16,860 - PoinTr - INFO - args.experiment_path : ./experiments/PoinTr/PCN_models/example
2022-11-21 07:18:16,860 - PoinTr - INFO - args.tfboard_path : ./experiments/PoinTr/PCN_models/TFBoard/example
2022-11-21 07:18:16,860 - PoinTr - INFO - args.log_name : PoinTr
2022-11-21 07:18:16,860 - PoinTr - INFO - args.use_gpu : True
2022-11-21 07:18:16,860 - PoinTr - INFO - args.distributed : False
2022-11-21 07:18:16,860 - PoinTr - INFO - config.optimizer = edict()
2022-11-21 07:18:16,860 - PoinTr - INFO - config.optimizer.type : AdamW
2022-11-21 07:18:16,860 - PoinTr - INFO - config.optimizer.kwargs = edict()
2022-11-21 07:18:16,860 - PoinTr - INFO - config.optimizer.kwargs.lr : 0.0005
2022-11-21 07:18:16,860 - PoinTr - INFO - config.optimizer.kwargs.weight_decay : 0.0005
2022-11-21 07:18:16,860 - PoinTr - INFO - config.scheduler = edict()
2022-11-21 07:18:16,861 - PoinTr - INFO - config.scheduler.type : LambdaLR
2022-11-21 07:18:16,861 - PoinTr - INFO - config.scheduler.kwargs = edict()
2022-11-21 07:18:16,861 - PoinTr - INFO - config.scheduler.kwargs.decay_step : 21
2022-11-21 07:18:16,861 - PoinTr - INFO - config.scheduler.kwargs.lr_decay : 0.9
2022-11-21 07:18:16,861 - PoinTr - INFO - config.scheduler.kwargs.lowest_decay : 0.02
2022-11-21 07:18:16,861 - PoinTr - INFO - config.bnmscheduler = edict()
2022-11-21 07:18:16,861 - PoinTr - INFO - config.bnmscheduler.type : Lambda
2022-11-21 07:18:16,861 - PoinTr - INFO - config.bnmscheduler.kwargs = edict()
2022-11-21 07:18:16,861 - PoinTr - INFO - config.bnmscheduler.kwargs.decay_step : 21
2022-11-21 07:18:16,861 - PoinTr - INFO - config.bnmscheduler.kwargs.bn_decay : 0.5
2022-11-21 07:18:16,861 - PoinTr - INFO - config.bnmscheduler.kwargs.bn_momentum : 0.9
2022-11-21 07:18:16,861 - PoinTr - INFO - config.bnmscheduler.kwargs.lowest_decay : 0.01
2022-11-21 07:18:16,861 - PoinTr - INFO - config.dataset = edict()
2022-11-21 07:18:16,861 - PoinTr - INFO - config.dataset.train = edict()
2022-11-21 07:18:16,861 - PoinTr - INFO - config.dataset.train._base_ = edict()
2022-11-21 07:18:16,861 - PoinTr - INFO - config.dataset.train._base_.NAME : PCN
2022-11-21 07:18:16,861 - PoinTr - INFO - config.dataset.train._base_.CATEGORY_FILE_PATH : data/PCN/PCN.json
2022-11-21 07:18:16,861 - PoinTr - INFO - config.dataset.train._base_.N_POINTS : 16384
2022-11-21 07:18:16,861 - PoinTr - INFO - config.dataset.train._base_.N_RENDERINGS : 8
2022-11-21 07:18:16,861 - PoinTr - INFO - config.dataset.train._base_.PARTIAL_POINTS_PATH : data/PCN/%s/partial/%s/%s/%02d.pcd
2022-11-21 07:18:16,861 - PoinTr - INFO - config.dataset.train._base_.COMPLETE_POINTS_PATH : data/PCN/%s/complete/%s/%s.pcd
2022-11-21 07:18:16,861 - PoinTr - INFO - config.dataset.train._base_.CARS : False
2022-11-21 07:18:16,861 - PoinTr - INFO - config.dataset.train.others = edict()
2022-11-21 07:18:16,861 - PoinTr - INFO - config.dataset.train.others.subset : train
2022-11-21 07:18:16,861 - PoinTr - INFO - config.dataset.train.others.bs : 48
2022-11-21 07:18:16,861 - PoinTr - INFO - config.dataset.val = edict()
2022-11-21 07:18:16,861 - PoinTr - INFO - config.dataset.val._base_ = edict()
2022-11-21 07:18:16,861 - PoinTr - INFO - config.dataset.val._base_.NAME : PCN
2022-11-21 07:18:16,861 - PoinTr - INFO - config.dataset.val._base_.CATEGORY_FILE_PATH : data/PCN/PCN.json
2022-11-21 07:18:16,862 - PoinTr - INFO - config.dataset.val._base_.N_POINTS : 16384
2022-11-21 07:18:16,862 - PoinTr - INFO - config.dataset.val._base_.N_RENDERINGS : 8
2022-11-21 07:18:16,862 - PoinTr - INFO - config.dataset.val._base_.PARTIAL_POINTS_PATH : data/PCN/%s/partial/%s/%s/%02d.pcd
2022-11-21 07:18:16,862 - PoinTr - INFO - config.dataset.val._base_.COMPLETE_POINTS_PATH : data/PCN/%s/complete/%s/%s.pcd
2022-11-21 07:18:16,862 - PoinTr - INFO - config.dataset.val._base_.CARS : False
2022-11-21 07:18:16,862 - PoinTr - INFO - config.dataset.val.others = edict()
2022-11-21 07:18:16,862 - PoinTr - INFO - config.dataset.val.others.subset : test
2022-11-21 07:18:16,862 - PoinTr - INFO - config.dataset.test = edict()
2022-11-21 07:18:16,862 - PoinTr - INFO - config.dataset.test._base_ = edict()
2022-11-21 07:18:16,862 - PoinTr - INFO - config.dataset.test._base_.NAME : PCN
2022-11-21 07:18:16,862 - PoinTr - INFO - config.dataset.test._base_.CATEGORY_FILE_PATH : data/PCN/PCN.json
2022-11-21 07:18:16,862 - PoinTr - INFO - config.dataset.test._base_.N_POINTS : 16384
2022-11-21 07:18:16,862 - PoinTr - INFO - config.dataset.test._base_.N_RENDERINGS : 8
2022-11-21 07:18:16,862 - PoinTr - INFO - config.dataset.test._base_.PARTIAL_POINTS_PATH : data/PCN/%s/partial/%s/%s/%02d.pcd
2022-11-21 07:18:16,862 - PoinTr - INFO - config.dataset.test._base_.COMPLETE_POINTS_PATH : data/PCN/%s/complete/%s/%s.pcd
2022-11-21 07:18:16,862 - PoinTr - INFO - config.dataset.test._base_.CARS : False
2022-11-21 07:18:16,862 - PoinTr - INFO - config.dataset.test.others = edict()
2022-11-21 07:18:16,862 - PoinTr - INFO - config.dataset.test.others.subset : test
2022-11-21 07:18:16,862 - PoinTr - INFO - config.model = edict()
2022-11-21 07:18:16,862 - PoinTr - INFO - config.model.NAME : PoinTr
2022-11-21 07:18:16,862 - PoinTr - INFO - config.model.num_pred : 14336
2022-11-21 07:18:16,862 - PoinTr - INFO - config.model.num_query : 224
2022-11-21 07:18:16,862 - PoinTr - INFO - config.model.knn_layer : 1
2022-11-21 07:18:16,862 - PoinTr - INFO - config.model.trans_dim : 384
2022-11-21 07:18:16,862 - PoinTr - INFO - config.total_bs : 48
2022-11-21 07:18:16,862 - PoinTr - INFO - config.step_per_update : 1
2022-11-21 07:18:16,862 - PoinTr - INFO - config.max_epoch : 300
2022-11-21 07:18:16,862 - PoinTr - INFO - config.consider_metric : CDL1
2022-11-21 07:18:16,863 - PoinTr - INFO - Distributed training: False
2022-11-21 07:18:16,863 - PoinTr - INFO - Set random seed to 0, deterministic: False
2022-11-21 07:18:16,871 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=02691156, Name=airplane]
2022-11-21 07:18:16,978 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=02933112, Name=cabinet]
2022-11-21 07:18:16,985 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=02958343, Name=car]
2022-11-21 07:18:17,013 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=03001627, Name=chair]
2022-11-21 07:18:17,041 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=03636649, Name=lamp]
2022-11-21 07:18:17,051 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=04256520, Name=sofa]
2022-11-21 07:18:17,066 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=04379243, Name=table]
2022-11-21 07:18:17,096 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=04530566, Name=watercraft]
2022-11-21 07:18:17,105 - PCNDATASET - INFO - Complete collecting files of the dataset. Total files: 28974
2022-11-21 07:18:17,112 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=02691156, Name=airplane]
2022-11-21 07:18:17,113 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=02933112, Name=cabinet]
2022-11-21 07:18:17,113 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=02958343, Name=car]
2022-11-21 07:18:17,114 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=03001627, Name=chair]
2022-11-21 07:18:17,114 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=03636649, Name=lamp]
2022-11-21 07:18:17,114 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=04256520, Name=sofa]
2022-11-21 07:18:17,116 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=04379243, Name=table]
2022-11-21 07:18:17,116 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=04530566, Name=watercraft]
2022-11-21 07:18:17,116 - PCNDATASET - INFO - Complete collecting files of the dataset. Total files: 1200
2022-11-21 07:18:17,117 - MODEL - INFO -  Transformer with knn_layer 1
2022-11-21 07:18:19,483 - PoinTr - INFO - Using Data parallel ...
**************************************************
coor_k shape:  torch.Size([48, 3, 2048])
coor_q shape:  torch.Size([48, 3, 2048])
idx = knn_point(k, coor_k.transpose(-1, -2).contiguous(), coor_q.transpose(-1, -2).contiguous()) # B G M tensor([[[-5584463534953070592, -5620492331972034560, -5584463534944681984,
           ..., -5584463534936293376, -5584463534986625024,
          -5620492331955257344],
         [          3003121664, -5584463534969847808, -5620492331955257344,
           ..., -5584463534944681984, -5584463534953070592,
          -5620492331955257344],
         [-5764607520039501824, -5584463534936293376, -5584463534944681984,
           ..., -5584463534944681984, -5584463534944681984,
                             0],
         ...,
         [                1183,                 1184,                 1185,
           ...,                 1196,                 1197,
                          1198],
         [                1183,                 1184,                 1185,
           ...,                 1196,                 1197,
                          1198],
         [                1183,                 1184,                 1185,
           ...,                 1196,                 1197,
                          1198]],

        [[                   0,                    0,                    0,
           ...,                    0,                    0,
                             0],
         [                   1,                    0,                    0,
           ...,                    0,                    0,
                             0],
         [                   2,                    0,                    0,
           ...,                    0,                    0,
                             0],
         ...,
         [                 528,                  529,                  530,
           ...,                  541,                  542,
                           543],
         [                 528,                  529,                  530,
           ...,                  541,                  542,
                           543],
         [                 528,                  529,                  530,
           ...,                  541,                  542,
                           543]],

        [[                   0,                    0,                    0,
           ...,                    0,                    0,
                             0],
         [                   1,                    0,                    0,
           ...,                    0,                    0,
                             0],
         [                   2,                    0,                    0,
           ...,                    0,                    0,
                             0],
         ...,
         [                 713,                  714,                  715,
           ...,                  726,                  727,
                           728],
         [                 713,                  714,                  715,
           ...,                  726,                  727,
                           728],
         [                 713,                  714,                  715,
           ...,                  726,                  727,
                           728]],

        ...,

        [[                   0,                  413,                  429,
           ...,                 1358,                 1446,
                           807],
         [                   1,                    5,                   80,
           ...,                 1398,                 1445,
                           918],
         [                   2,                  201,                  404,
           ...,                 1346,                 1396,
                           515],
         ...,
         [                1461,                 1462,                 1463,
           ...,                 1474,                 1475,
                          1476],
         [                1461,                 1462,                 1463,
           ...,                 1474,                 1475,
                          1476],
         [                1461,                 1462,                 1463,
           ...,                 1474,                 1475,
                          1476]],

        [[                   0,                   30,                   37,
           ...,                  738,                  743,
                           534],
         [                   1,                   15,                  129,
           ...,                  504,                  532,
                           660],
         [                   2,                   76,                   83,
           ...,                  724,                  761,
                           135],
         ...,
         [                 798,                  799,                  800,
           ...,                  811,                  812,
                           813],
         [                 798,                  799,                  800,
           ...,                  811,                  812,
                           813],
         [                 798,                  799,                  800,
           ...,                  811,                  812,
                           813]],

        [[                   0,                   26,                  198,
           ...,                  491,                  494,
                           286],
         [                   1,                  115,                  186,
           ...,                  541,                  555,
                           452],
         [                   2,                   24,                  124,
           ...,                  555,                  569,
                           409],
         ...,
         [                 581,                  582,                  583,
           ...,                  594,                  595,
                           596],
         [                 581,                  582,                  583,
           ...,                  594,                  595,
                           596],
         [                 581,                  582,                  583,
           ...,                  594,                  595,
                           596]]], device='cuda:0')
idx:  tensor([[[-5584463534953070592, -5620492331972034560, -5584463534944681984,
           ..., -5584463534936293376, -5584463534986625024,
          -5620492331955257344],
         [          3003121664, -5584463534969847808, -5620492331955257344,
           ..., -5584463534944681984, -5584463534953070592,
          -5620492331955257344],
         [-5764607520039501824, -5584463534936293376, -5584463534944681984,
           ..., -5584463534944681984, -5584463534944681984,
                             0],
         ...,
         [                1183,                 1184,                 1185,
           ...,                 1196,                 1197,
                          1198],
         [                1183,                 1184,                 1185,
           ...,                 1196,                 1197,
                          1198],
         [                1183,                 1184,                 1185,
           ...,                 1196,                 1197,
                          1198]],

        [[                   0,                    0,                    0,
           ...,                    0,                    0,
                             0],
         [                   1,                    0,                    0,
           ...,                    0,                    0,
                             0],
         [                   2,                    0,                    0,
           ...,                    0,                    0,
                             0],
         ...,
         [                 528,                  529,                  530,
           ...,                  541,                  542,
                           543],
         [                 528,                  529,                  530,
           ...,                  541,                  542,
                           543],
         [                 528,                  529,                  530,
           ...,                  541,                  542,
                           543]],

        [[                   0,                    0,                    0,
           ...,                    0,                    0,
                             0],
         [                   1,                    0,                    0,
           ...,                    0,                    0,
                             0],
         [                   2,                    0,                    0,
           ...,                    0,                    0,
                             0],
         ...,
         [                 713,                  714,                  715,
           ...,                  726,                  727,
                           728],
         [                 713,                  714,                  715,
           ...,                  726,                  727,
                           728],
         [                 713,                  714,                  715,
           ...,                  726,                  727,
                           728]],

        ...,

        [[                   0,                  413,                  429,
           ...,                 1358,                 1446,
                           807],
         [                   1,                    5,                   80,
           ...,                 1398,                 1445,
                           918],
         [                   2,                  201,                  404,
           ...,                 1346,                 1396,
                           515],
         ...,
         [                1461,                 1462,                 1463,
           ...,                 1474,                 1475,
                          1476],
         [                1461,                 1462,                 1463,
           ...,                 1474,                 1475,
                          1476],
         [                1461,                 1462,                 1463,
           ...,                 1474,                 1475,
                          1476]],

        [[                   0,                   30,                   37,
           ...,                  738,                  743,
                           534],
         [                   1,                   15,                  129,
           ...,                  504,                  532,
                           660],
         [                   2,                   76,                   83,
           ...,                  724,                  761,
                           135],
         ...,
         [                 798,                  799,                  800,
           ...,                  811,                  812,
                           813],
         [                 798,                  799,                  800,
           ...,                  811,                  812,
                           813],
         [                 798,                  799,                  800,
           ...,                  811,                  812,
                           813]],

        [[                   0,                   26,                  198,
           ...,                  491,                  494,
                           286],
         [                   1,                  115,                  186,
           ...,                  541,                  555,
                           452],
         [                   2,                   24,                  124,
           ...,                  555,                  569,
                           409],
         ...,
         [                 581,                  582,                  583,
           ...,                  594,                  595,
                           596],
         [                 581,                  582,                  583,
           ...,                  594,                  595,
                           596],
         [                 581,                  582,                  583,
           ...,                  594,                  595,
                           596]]], device='cuda:0')
idx = idx.transpose(-1, -2).contiguous()
idx:  tensor([[[-5584463534953070592,           3003121664, -5764607520039501824,
           ...,                 1183,                 1183,
                          1183],
         [-5620492331972034560, -5584463534969847808, -5584463534936293376,
           ...,                 1184,                 1184,
                          1184],
         [-5584463534944681984, -5620492331955257344, -5584463534944681984,
           ...,                 1185,                 1185,
                          1185],
         ...,
         [-5584463534936293376, -5584463534944681984, -5584463534944681984,
           ...,                 1196,                 1196,
                          1196],
         [-5584463534986625024, -5584463534953070592, -5584463534944681984,
           ...,                 1197,                 1197,
                          1197],
         [-5620492331955257344, -5620492331955257344,                    0,
           ...,                 1198,                 1198,
                          1198]],

        [[                   0,                    1,                    2,
           ...,                  528,                  528,
                           528],
         [                   0,                    0,                    0,
           ...,                  529,                  529,
                           529],
         [                   0,                    0,                    0,
           ...,                  530,                  530,
                           530],
         ...,
         [                   0,                    0,                    0,
           ...,                  541,                  541,
                           541],
         [                   0,                    0,                    0,
           ...,                  542,                  542,
                           542],
         [                   0,                    0,                    0,
           ...,                  543,                  543,
                           543]],

        [[                   0,                    1,                    2,
           ...,                  713,                  713,
                           713],
         [                   0,                    0,                    0,
           ...,                  714,                  714,
                           714],
         [                   0,                    0,                    0,
           ...,                  715,                  715,
                           715],
         ...,
         [                   0,                    0,                    0,
           ...,                  726,                  726,
                           726],
         [                   0,                    0,                    0,
           ...,                  727,                  727,
                           727],
         [                   0,                    0,                    0,
           ...,                  728,                  728,
                           728]],

        ...,

        [[                   0,                    1,                    2,
           ...,                 1461,                 1461,
                          1461],
         [                 413,                    5,                  201,
           ...,                 1462,                 1462,
                          1462],
         [                 429,                   80,                  404,
           ...,                 1463,                 1463,
                          1463],
         ...,
         [                1358,                 1398,                 1346,
           ...,                 1474,                 1474,
                          1474],
         [                1446,                 1445,                 1396,
           ...,                 1475,                 1475,
                          1475],
         [                 807,                  918,                  515,
           ...,                 1476,                 1476,
                          1476]],

        [[                   0,                    1,                    2,
           ...,                  798,                  798,
                           798],
         [                  30,                   15,                   76,
           ...,                  799,                  799,
                           799],
         [                  37,                  129,                   83,
           ...,                  800,                  800,
                           800],
         ...,
         [                 738,                  504,                  724,
           ...,                  811,                  811,
                           811],
         [                 743,                  532,                  761,
           ...,                  812,                  812,
                           812],
         [                 534,                  660,                  135,
           ...,                  813,                  813,
                           813]],

        [[                   0,                    1,                    2,
           ...,                  581,                  581,
                           581],
         [                  26,                  115,                   24,
           ...,                  582,                  582,
                           582],
         [                 198,                  186,                  124,
           ...,                  583,                  583,
                           583],
         ...,
         [                 491,                  541,                  555,
           ...,                  594,                  594,
                           594],
         [                 494,                  555,                  569,
           ...,                  595,                  595,
                           595],
         [                 286,                  452,                  409,
           ...,                  596,                  596,
                           596]]], device='cuda:0')
idx_base:  tensor([[[    0]],

        [[ 2048]],

        [[ 4096]],

        [[ 6144]],

        [[ 8192]],

        [[10240]],

        [[12288]],

        [[14336]],

        [[16384]],

        [[18432]],

        [[20480]],

        [[22528]],

        [[24576]],

        [[26624]],

        [[28672]],

        [[30720]],

        [[32768]],

        [[34816]],

        [[36864]],

        [[38912]],

        [[40960]],

        [[43008]],

        [[45056]],

        [[47104]],

        [[49152]],

        [[51200]],

        [[53248]],

        [[55296]],

        [[57344]],

        [[59392]],

        [[61440]],

        [[63488]],

        [[65536]],

        [[67584]],

        [[69632]],

        [[71680]],

        [[73728]],

        [[75776]],

        [[77824]],

        [[79872]],

        [[81920]],

        [[83968]],

        [[86016]],

        [[88064]],

        [[90112]],

        [[92160]],

        [[94208]],

        [[96256]]], device='cuda:0')
idx = idx + idx_base
idx:  tensor([[[-5584463534953070592,           3003121664, -5764607520039501824,
           ...,                 1183,                 1183,
                          1183],
         [-5620492331972034560, -5584463534969847808, -5584463534936293376,
           ...,                 1184,                 1184,
                          1184],
         [-5584463534944681984, -5620492331955257344, -5584463534944681984,
           ...,                 1185,                 1185,
                          1185],
         ...,
         [-5584463534936293376, -5584463534944681984, -5584463534944681984,
           ...,                 1196,                 1196,
                          1196],
         [-5584463534986625024, -5584463534953070592, -5584463534944681984,
           ...,                 1197,                 1197,
                          1197],
         [-5620492331955257344, -5620492331955257344,                    0,
           ...,                 1198,                 1198,
                          1198]],

        [[                2048,                 2049,                 2050,
           ...,                 2576,                 2576,
                          2576],
         [                2048,                 2048,                 2048,
           ...,                 2577,                 2577,
                          2577],
         [                2048,                 2048,                 2048,
           ...,                 2578,                 2578,
                          2578],
         ...,
         [                2048,                 2048,                 2048,
           ...,                 2589,                 2589,
                          2589],
         [                2048,                 2048,                 2048,
           ...,                 2590,                 2590,
                          2590],
         [                2048,                 2048,                 2048,
           ...,                 2591,                 2591,
                          2591]],

        [[                4096,                 4097,                 4098,
           ...,                 4809,                 4809,
                          4809],
         [                4096,                 4096,                 4096,
           ...,                 4810,                 4810,
                          4810],
         [                4096,                 4096,                 4096,
           ...,                 4811,                 4811,
                          4811],
         ...,
         [                4096,                 4096,                 4096,
           ...,                 4822,                 4822,
                          4822],
         [                4096,                 4096,                 4096,
           ...,                 4823,                 4823,
                          4823],
         [                4096,                 4096,                 4096,
           ...,                 4824,                 4824,
                          4824]],

        ...,

        [[               92160,                92161,                92162,
           ...,                93621,                93621,
                         93621],
         [               92573,                92165,                92361,
           ...,                93622,                93622,
                         93622],
         [               92589,                92240,                92564,
           ...,                93623,                93623,
                         93623],
         ...,
         [               93518,                93558,                93506,
           ...,                93634,                93634,
                         93634],
         [               93606,                93605,                93556,
           ...,                93635,                93635,
                         93635],
         [               92967,                93078,                92675,
           ...,                93636,                93636,
                         93636]],

        [[               94208,                94209,                94210,
           ...,                95006,                95006,
                         95006],
         [               94238,                94223,                94284,
           ...,                95007,                95007,
                         95007],
         [               94245,                94337,                94291,
           ...,                95008,                95008,
                         95008],
         ...,
         [               94946,                94712,                94932,
           ...,                95019,                95019,
                         95019],
         [               94951,                94740,                94969,
           ...,                95020,                95020,
                         95020],
         [               94742,                94868,                94343,
           ...,                95021,                95021,
                         95021]],

        [[               96256,                96257,                96258,
           ...,                96837,                96837,
                         96837],
         [               96282,                96371,                96280,
           ...,                96838,                96838,
                         96838],
         [               96454,                96442,                96380,
           ...,                96839,                96839,
                         96839],
         ...,
         [               96747,                96797,                96811,
           ...,                96850,                96850,
                         96850],
         [               96750,                96811,                96825,
           ...,                96851,                96851,
                         96851],
         [               96542,                96708,                96665,
           ...,                96852,                96852,
                         96852]]], device='cuda:0')
idx = idx.view(-1)
idx:  tensor([-5584463534953070592,           3003121664, -5764607520039501824,
         ...,                96852,                96852,
                       96852], device='cuda:0')
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [3,0,0], thread: [96,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [3,0,0], thread: [97,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed
*****some repeat lines*****
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [97,0,0], thread: [111,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
Traceback (most recent call last):
  File "main.py", line 68, in <module>
    main()
  File "main.py", line 64, in main
    run_net(args, config, train_writer, val_writer)
  File "/content/pointr/tools/runner.py", line 98, in run_net
    ret = base_model(partial)
  File "/usr/local/envs/myenv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/envs/myenv/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 169, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/usr/local/envs/myenv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/content/pointr/models/PoinTr.py", line 92, in forward
    q, coarse_point_cloud = self.base_model(xyz) # B M C and B M 3
  File "/usr/local/envs/myenv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/content/pointr/models/Transformer.py", line 353, in forward
    coor, f = self.grouper(inpc.transpose(1,2).contiguous()) 
  File "/usr/local/envs/myenv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/content/pointr/models/dgcnn_group.py", line 137, in forward
    f = self.layer1(f)
  File "/usr/local/envs/myenv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/envs/myenv/lib/python3.7/site-packages/torch/nn/modules/container.py", line 204, in forward
    input = module(input)
  File "/usr/local/envs/myenv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/envs/myenv/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 463, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/usr/local/envs/myenv/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 460, in _conv_forward
    self.padding, self.dilation, self.groups)
RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR
You can try to repro this exception using the following code snippet. If that doesn't trigger the error, please include your original repro script when reporting this issue.

import torch
torch.backends.cuda.matmul.allow_tf32 = False
torch.backends.cudnn.benchmark = True
torch.backends.cudnn.deterministic = False
torch.backends.cudnn.allow_tf32 = True
data = torch.randn([48, 16, 2048, 16], dtype=torch.float, device='cuda', requires_grad=True)
net = torch.nn.Conv2d(16, 32, kernel_size=[1, 1], padding=[0, 0], stride=[1, 1], dilation=[1, 1], groups=1)
net = net.cuda().float()
out = net(data)
out.backward(torch.randn_like(out))
torch.cuda.synchronize()

ConvolutionParams 
    memory_format = Contiguous
    data_type = CUDNN_DATA_FLOAT
    padding = [0, 0, 0]
    stride = [1, 1, 0]
    dilation = [1, 1, 0]
    groups = 1
    deterministic = false
    allow_tf32 = true
input: TensorDescriptor 0x5593baac81c0
    type = CUDNN_DATA_FLOAT
    nbDims = 4
    dimA = 48, 16, 2048, 16, 
    strideA = 524288, 32768, 16, 1, 
output: TensorDescriptor 0x559435ebfd40
    type = CUDNN_DATA_FLOAT
    nbDims = 4
    dimA = 48, 32, 2048, 16, 
    strideA = 1048576, 32768, 16, 1, 
weight: FilterDescriptor 0x559433965ce0
    type = CUDNN_DATA_FLOAT
    tensor_format = CUDNN_TENSOR_NCHW
    nbDims = 4
    dimA = 32, 16, 1, 1, 
Pointer addresses: 
    input: 0x7f1e38c00000
    output: 0x7f1e3ec00000
    weight: 0x7f1ecf000600


Thank u so much!

@yuxumin
Copy link
Owner

yuxumin commented Nov 21, 2022

Hi, it seems the last issue ( Unexpected shape of idx ) has gone.
The new error comes from the negative idx

(idx: tensor([[[-5584463534953070592, 3003121664, -5764607520039501824] ...)

However, i am not sure why this error occurred. idx is produced by torch.topk function. (https://github.com/yuxumin/PoinTr/blob/master/models/dgcnn_group.py#L17)
Could you provide more information?

@jackie174
Copy link
Author

@yuxumin
Copy link
Owner

yuxumin commented Nov 21, 2022

Hi, a permission is required to visit this colab.

@jackie174
Copy link
Author

jackie174 commented Nov 21, 2022

@jackie174
Copy link
Author

Hi, I try use a local laptop to implement the code.
Looks assertion error is gone, but new problem!!!

bash ./scripts/train.sh 0 \
    --config ./cfgs/PCN_models/PoinTr.yaml \
    --exp_name example

s.deterministic : False
2022-11-21 03:37:27,785 - PoinTr - INFO - args.sync_bn : False
2022-11-21 03:37:27,786 - PoinTr - INFO - args.exp_name : example
2022-11-21 03:37:27,787 - PoinTr - INFO - args.start_ckpts : None
2022-11-21 03:37:27,788 - PoinTr - INFO - args.ckpts : None
2022-11-21 03:37:27,789 - PoinTr - INFO - args.val_freq : 1
2022-11-21 03:37:27,790 - PoinTr - INFO - args.resume : False
2022-11-21 03:37:27,790 - PoinTr - INFO - args.test : False
2022-11-21 03:37:27,792 - PoinTr - INFO - args.mode : None
2022-11-21 03:37:27,794 - PoinTr - INFO - args.experiment_path : ./experiments/PoinTr/KITTI_models/example
2022-11-21 03:37:27,795 - PoinTr - INFO - args.tfboard_path : ./experiments/PoinTr/KITTI_models/TFBoard/example
2022-11-21 03:37:27,796 - PoinTr - INFO - args.log_name : PoinTr
2022-11-21 03:37:27,796 - PoinTr - INFO - args.use_gpu : True
2022-11-21 03:37:27,797 - PoinTr - INFO - args.distributed : False
2022-11-21 03:37:27,798 - PoinTr - INFO - config.optimizer = edict()
2022-11-21 03:37:27,800 - PoinTr - INFO - config.optimizer.type : AdamW
2022-11-21 03:37:27,801 - PoinTr - INFO - config.optimizer.kwargs = edict()
2022-11-21 03:37:27,802 - PoinTr - INFO - config.optimizer.kwargs.lr : 0.0001
2022-11-21 03:37:27,803 - PoinTr - INFO - config.optimizer.kwargs.weight_decay : 0.0005
2022-11-21 03:37:27,804 - PoinTr - INFO - config.scheduler = edict()
2022-11-21 03:37:27,805 - PoinTr - INFO - config.scheduler.type : LambdaLR
2022-11-21 03:37:27,806 - PoinTr - INFO - config.scheduler.kwargs = edict()
2022-11-21 03:37:27,810 - PoinTr - INFO - config.scheduler.kwargs.decay_step : 21
2022-11-21 03:37:27,812 - PoinTr - INFO - config.scheduler.kwargs.lr_decay : 0.9
2022-11-21 03:37:27,814 - PoinTr - INFO - config.scheduler.kwargs.lowest_decay : 0.02
2022-11-21 03:37:27,817 - PoinTr - INFO - config.bnmscheduler = edict()
2022-11-21 03:37:27,818 - PoinTr - INFO - config.bnmscheduler.type : Lambda
2022-11-21 03:37:27,819 - PoinTr - INFO - config.bnmscheduler.kwargs = edict()
2022-11-21 03:37:27,819 - PoinTr - INFO - config.bnmscheduler.kwargs.decay_step : 21
2022-11-21 03:37:27,820 - PoinTr - INFO - config.bnmscheduler.kwargs.bn_decay : 0.5
2022-11-21 03:37:27,820 - PoinTr - INFO - config.bnmscheduler.kwargs.bn_momentum : 0.9
2022-11-21 03:37:27,821 - PoinTr - INFO - config.bnmscheduler.kwargs.lowest_decay : 0.01
2022-11-21 03:37:27,821 - PoinTr - INFO - config.dataset = edict()
2022-11-21 03:37:27,822 - PoinTr - INFO - config.dataset.train = edict()
2022-11-21 03:37:27,822 - PoinTr - INFO - config.dataset.train.base = edict()
2022-11-21 03:37:27,822 - PoinTr - INFO - config.dataset.train.base.NAME : PCN
2022-11-21 03:37:27,823 - PoinTr - INFO - config.dataset.train.base.CATEGORY_FILE_PATH : data/PCN/PCN.json
2022-11-21 03:37:27,823 - PoinTr - INFO - config.dataset.train.base.N_POINTS : 16384
2022-11-21 03:37:27,824 - PoinTr - INFO - config.dataset.train.base.N_RENDERINGS : 8
2022-11-21 03:37:27,824 - PoinTr - INFO - config.dataset.train.base.PARTIAL_POINTS_PATH : data/PCN/%s/partial/%s/%s/%02d.pcd
2022-11-21 03:37:27,825 - PoinTr - INFO - config.dataset.train.base.COMPLETE_POINTS_PATH : data/PCN/%s/complete/%s/%s.pcd
2022-11-21 03:37:27,827 - PoinTr - INFO - config.dataset.train.base.CARS : True
2022-11-21 03:37:27,827 - PoinTr - INFO - config.dataset.train.others = edict()
2022-11-21 03:37:27,828 - PoinTr - INFO - config.dataset.train.others.subset : train
2022-11-21 03:37:27,829 - PoinTr - INFO - config.dataset.train.others.bs : 64
2022-11-21 03:37:27,830 - PoinTr - INFO - config.dataset.val = edict()
2022-11-21 03:37:27,831 - PoinTr - INFO - config.dataset.val.base = edict()
2022-11-21 03:37:27,832 - PoinTr - INFO - config.dataset.val.base.NAME : PCN
2022-11-21 03:37:27,833 - PoinTr - INFO - config.dataset.val.base.CATEGORY_FILE_PATH : data/PCN/PCN.json
2022-11-21 03:37:27,834 - PoinTr - INFO - config.dataset.val.base.N_POINTS : 16384
2022-11-21 03:37:27,836 - PoinTr - INFO - config.dataset.val.base.N_RENDERINGS : 8
2022-11-21 03:37:27,836 - PoinTr - INFO - config.dataset.val.base.PARTIAL_POINTS_PATH : data/PCN/%s/partial/%s/%s/%02d.pcd
2022-11-21 03:37:27,837 - PoinTr - INFO - config.dataset.val.base.COMPLETE_POINTS_PATH : data/PCN/%s/complete/%s/%s.pcd
2022-11-21 03:37:27,838 - PoinTr - INFO - config.dataset.val.base.CARS : True
2022-11-21 03:37:27,839 - PoinTr - INFO - config.dataset.val.others = edict()
2022-11-21 03:37:27,839 - PoinTr - INFO - config.dataset.val.others.subset : test
2022-11-21 03:37:27,840 - PoinTr - INFO - config.dataset.test = edict()
2022-11-21 03:37:27,841 - PoinTr - INFO - config.dataset.test.base = edict()
2022-11-21 03:37:27,842 - PoinTr - INFO - config.dataset.test.base.NAME : KITTI
2022-11-21 03:37:27,842 - PoinTr - INFO - config.dataset.test.base.CATEGORY_FILE_PATH : data/KITTI/KITTI.json
2022-11-21 03:37:27,844 - PoinTr - INFO - config.dataset.test.base.N_POINTS : 16384
2022-11-21 03:37:27,845 - PoinTr - INFO - config.dataset.test.base.CLOUD_PATH : data/KITTI/cars/%s.pcd
2022-11-21 03:37:27,848 - PoinTr - INFO - config.dataset.test.base.BBOX_PATH : data/KITTI/bboxes/%s.txt
2022-11-21 03:37:27,854 - PoinTr - INFO - config.dataset.test.others = edict()
2022-11-21 03:37:27,855 - PoinTr - INFO - config.dataset.test.others.subset : test
2022-11-21 03:37:27,858 - PoinTr - INFO - config.model = edict()
2022-11-21 03:37:27,863 - PoinTr - INFO - config.model.NAME : PoinTr
2022-11-21 03:37:27,865 - PoinTr - INFO - config.model.num_pred : 14336
2022-11-21 03:37:27,866 - PoinTr - INFO - config.model.num_query : 224
2022-11-21 03:37:27,867 - PoinTr - INFO - config.model.knn_layer : 1
2022-11-21 03:37:27,868 - PoinTr - INFO - config.model.trans_dim : 384
2022-11-21 03:37:27,869 - PoinTr - INFO - config.total_bs : 64
2022-11-21 03:37:27,870 - PoinTr - INFO - config.step_per_update : 1
2022-11-21 03:37:27,870 - PoinTr - INFO - config.max_epoch : 600
2022-11-21 03:37:27,871 - PoinTr - INFO - config.consider_metric : CDL1
2022-11-21 03:37:27,872 - PoinTr - INFO - Distributed training: False
2022-11-21 03:37:27,872 - PoinTr - INFO - Set random seed to 0, deterministic: False
2022-11-21 03:37:27,958 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=02958343, Name=car]
2022-11-21 03:37:28,078 - PCNDATASET - INFO - Complete collecting files of the dataset. Total files: 5677
2022-11-21 03:37:28,176 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=02958343, Name=car]
2022-11-21 03:37:28,177 - PCNDATASET - INFO - Complete collecting files of the dataset. Total files: 150
2022-11-21 03:37:28,178 - MODEL - INFO - Transformer with knn_layer 1
2022-11-21 03:38:30,817 - PoinTr - INFO - Using Data parallel ...
Format = auto
Extension = pcd
Format = auto
Extension = pcd
Format = auto
*****a lot of repeat ******
Format = auto
Extension = pcd
Format = auto
Extension = pcd
Format = auto
Traceback (most recent call last):
File "main.py", line 68, in
main()
File "main.py", line 64, in main
run_net(args, config, train_writer, val_writer)
File "/mnt/f/PoinTr/tools/runner.py", line 98, in run_net
ret = base_model(partial)
File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 166, in forward
return self.module(*inputs[0], **kwargs[0])
File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/f/PoinTr/models/PoinTr.py", line 92, in forward
q, coarse_point_cloud = self.base_model(xyz) # B M C and B M 3
File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/f/PoinTr/models/Transformer.py", line 354, in forward
knn_index = get_knn_index(coor)
File "/mnt/f/PoinTr/models/Transformer.py", line 19, in get_knn_index
_, idx = knn(coor_k, coor_q) # bs k np
File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/knn_cuda/init.py", line 61, in forward
d, i = knn(ref.float(), query.float(), self.k)
File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/knn_cuda/init.py", line 39, in knn
d, i = _knn.knn(ref, query, k)
RuntimeError: ref.is_contiguous()INTERNAL ASSERT FAILED at "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/knn_cuda/csrc/cuda/knn.cpp":29, please report a bug to PyTorch. ref must be contiguous

@yuxumin
Copy link
Owner

yuxumin commented Nov 21, 2022

File "/mnt/f/PoinTr/models/Transformer.py", line 19, in get_knn_index
_, idx = knn(coor_k, coor_q) # bs k np

_, idx = knn(coor_k.contiguous(), coor_q.contiguous()) # bs k np

@jackie174
Copy link
Author

jackie174 commented Nov 21, 2022

SORRY to bother you again!
Can I know what kind of environment you use? For Cuda, tensor, TensorFlow, GCC, Python...
I even think mainly problem is made by the environment.
After I modified it, I got below:

  main()
  File "main.py", line 64, in main
    run_net(args, config, train_writer, val_writer)
  File "/mnt/f/PoinTr/tools/runner.py", line 98, in run_net
    ret = base_model(partial)
  File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 166, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/f/PoinTr/models/PoinTr.py", line 92, in forward
    q, coarse_point_cloud = self.base_model(xyz) # B M C and B M 3
  File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/f/PoinTr/models/Transformer.py", line 366, in forward
    x = blk(x + pos, knn_index)   # B N C
  File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/f/PoinTr/models/Transformer.py", line 217, in forward
    knn_f = get_graph_feature(norm_x, knn_index)
  File "/mnt/f/PoinTr/models/Transformer.py", line 33, in get_graph_feature
    feature = feature.view(batch_size, k, num_query, num_dims)
RuntimeError: shape '[48, 8, 128, 384]' is invalid for input of size 442368

@yuxumin
Copy link
Owner

yuxumin commented Nov 21, 2022

so, what's the shape of 'knn_index' in 'https://github.com/yuxumin/PoinTr/blob/master/models/Transformer.py#L32' in your code. Can you make sure you are in the right way to inference the code? (right model on the corresponding dataset)

@jackie174
Copy link
Author

jackie174 commented Nov 21, 2022

This is what I use in the code: https://github.com/Cmput-414/PoinTr/tree/change
My environment:
Cuda 10.1,
Torch 1.9.0+cu102,
torchaudio 0.9.0,
torchvision 0.10.0+cu102,
GCC 9.4
python 3.8.10

  1. bash ./scripts/train.sh 0 --config ./cfgs/PCN_models/PoinTr.yaml --exp_name example

knn_index_shape: torch.Size([1152]) knn_index: tensor([ 0, 1, 2, ..., 6018, 6018, 6017], device='cuda:0')

RuntimeError: shape '[48, 8, 128, 384]' is invalid for input of size 442368

  1. bash ./scripts/train.sh 0 --config ./cfgs/PCN_models/GRNet.yaml --exp_name example

RuntimeError: CUDA out of memory. Tried to allocate 128.00 MiB (GPU 0; 6.00 GiB total capacity; 3.47 GiB already allocated; 1020.84 MiB free; 3.48 GiB reserved in total by PyTorch)

  1. bash ./scripts/train.sh 0 --config ./cfgs/PCN_models/PCN.yaml --exp_name example

RuntimeError: CUDA out of memory. Tried to allocate 1024.00 MiB (GPU 0; 6.00 GiB total capacity; 2.74 GiB already allocated; 1.55 GiB free; 2.93 GiB reserved in total by PyTorch)

  1. bash ./scripts/train.sh 0 --config ./cfgs/KITTI_models/PoinTr.yaml --exp_name example

RuntimeError: CUDA out of memory. Tried to allocate 1024.00 MiB (GPU 0; 6.00 GiB total capacity; 1.18 GiB already allocated; 3.28 GiB free; 1.20 GiB reserved in total by PyTorch)

  1. bash ./scripts/train.sh 0 --config ./cfgs/KITTI_models/PCN.yaml --exp_name example

RuntimeError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 6.00 GiB total capacity; 481.80 MiB already allocated; 3.82 GiB free; 682.00 MiB reserved in total by PyTorch)

  1. bash ./scripts/train.sh 0 --config ./cfgs/ShapeNet55_models/PCN.yaml --exp_name example

RuntimeError: CUDA out of memory. Tried to allocate 1.01 GiB (GPU 0; 6.00 GiB total capacity; 2.22 GiB already allocated; 1.95 GiB free; 2.53 GiB reserved in total by PyTorch)

  1. bash ./scripts/train.sh 0 --config ./cfgs/ShapeNet55_models/PoinTr.yaml --exp_name example

RuntimeError: CUDA out of memory. Tried to allocate 1.20 GiB (GPU 0; 6.00 GiB total capacity; 2.38 GiB already allocated; 1.29 GiB free; 3.19 GiB reserved in total by PyTorch)

1.     main()
  File "main.py", line 64, in main
    run_net(args, config, train_writer, val_writer)
  File "/mnt/f/PoinTr/tools/runner.py", line 98, in run_net
    ret = base_model(partial)
  File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 166, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/f/PoinTr/models/PoinTr.py", line 92, in forward
    q, coarse_point_cloud = self.base_model(xyz) # B M C and B M 3
  File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/f/PoinTr/models/Transformer.py", line 367, in forward
    x = blk(x + pos, knn_index)   # B N C
  File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/f/PoinTr/models/Transformer.py", line 218, in forward
    knn_f = get_graph_feature(norm_x, knn_index)
  File "/mnt/f/PoinTr/models/Transformer.py", line 34, in get_graph_feature
    feature = feature.view(batch_size, k, num_query, num_dims)
RuntimeError: shape '[48, 8, 128, 384]' is invalid for input of size 442368
2. Traceback (most recent call last):
  File "main.py", line 68, in <module>
    main()
  File "main.py", line 64, in main
    run_net(args, config, train_writer, val_writer)
  File "/mnt/f/PoinTr/tools/runner.py", line 98, in run_net
    ret = base_model(partial)
  File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 166, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/f/PoinTr/models/GRNet.py", line 141, in forward
    pt_features_32_l = self.conv1(pt_features_64_l)
  File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/container.py", line 139, in forward
    input = module(input)
  File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/pooling.py", line 240, in forward
    return F.max_pool3d(input, self.kernel_size, self.stride,
  File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/_jit_internal.py", line 405, in fn
    return if_false(*args, **kwargs)
  File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/functional.py", line 784, in _max_pool3d
    return torch.max_pool3d(input, kernel_size, stride, padding, dilation, ceil_mode)
RuntimeError: CUDA out of memory. Tried to allocate 128.00 MiB (GPU 0; 6.00 GiB total capacity; 3.47 GiB already allocated; 1020.84 MiB free; 3.48 GiB reserved in total by PyTorch)
3.   File "main.py", line 64, in main
    run_net(args, config, train_writer, val_writer)
  File "/mnt/f/PoinTr/tools/runner.py", line 98, in run_net
    ret = base_model(partial)
  File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 166, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/f/PoinTr/models/PCN.py", line 76, in forward
    fine = self.final_conv(feat) + point_feat   # B 3 N
  File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/container.py", line 139, in forward
    input = module(input)
  File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 298, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 294, in _conv_forward
    return F.conv1d(input, weight, bias, self.stride,
RuntimeError: CUDA out of memory. Tried to allocate 1024.00 MiB (GPU 0; 6.00 GiB total capacity; 2.74 GiB already allocated; 1.55 GiB free; 2.93 GiB reserved in total by PyTorch)

@yuxumin
Copy link
Owner

yuxumin commented Nov 22, 2022

Sorry that i am not familiar with Google Colab, and can not run the code in your colab.

knn_index_shape: torch.Size([1152]) knn_index: tensor([ 0, 1, 2, ..., 6018, 6018, 6017], device='cuda:0')
RuntimeError: shape '[48, 8, 128, 384]' is invalid for input of size 442368

knn_index should be (bs * k * np), in the origin setting for PCN dataset, k = 8 and np = 224.
I think the error may due to the knn_cuda in your env.

I update a pytorch-based knn algorithm, could you can try the new code?

RuntimeError: CUDA out of memory.

For OOM problem, i think you can reduce the batchsize (just modify the yaml file)

@jackie174
Copy link
Author

THANK U SO MUCH!:
When I change the batch size to 2, it is running!!!
For now:
I followed by this
Then I also change the code that u just modified.
Yeah, the main thing is SET ENVIRONMENT.
This confused me a lot. But, it is solved, and I can start to learn the code.
Thanks again!!!!
You are really nice!!!

@yuxumin
Copy link
Owner

yuxumin commented Nov 22, 2022

@jackie174, Congrats!

@yuxumin yuxumin closed this as completed Nov 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants