AssertionError when idx.shape[1] == k #79

jackie174 · 2022-11-04T04:46:34Z

Hello Xumin, I got this problem, any suggestions?
bash ./scripts/train.sh 0
--config ./cfgs/KITTI_models/PoinTr.yaml
--exp_name example
/content/pointr
/content/pointr

GPUS=0
PY_ARGS='--config ./cfgs/KITTI_models/PoinTr.yaml --exp_name example'
CUDA_VISIBLE_DEVICES=0
python main.py --config ./cfgs/KITTI_models/PoinTr.yaml --exp_name example
2022-11-04 04:37:20,520 - PoinTr - INFO - Copy the Config file from ./cfgs/KITTI_models/PoinTr.yaml to ./experiments/PoinTr/KITTI_models/example/config.yaml
2022-11-04 04:37:20,520 - PoinTr - INFO - args.config : ./cfgs/KITTI_models/PoinTr.yaml
2022-11-04 04:37:20,520 - PoinTr - INFO - args.launcher : none
2022-11-04 04:37:20,520 - PoinTr - INFO - args.local_rank : 0
2022-11-04 04:37:20,520 - PoinTr - INFO - args.num_workers : 4
2022-11-04 04:37:20,520 - PoinTr - INFO - args.seed : 0
2022-11-04 04:37:20,521 - PoinTr - INFO - args.deterministic : False
2022-11-04 04:37:20,521 - PoinTr - INFO - args.sync_bn : False
2022-11-04 04:37:20,521 - PoinTr - INFO - args.exp_name : example
2022-11-04 04:37:20,521 - PoinTr - INFO - args.start_ckpts : None
2022-11-04 04:37:20,521 - PoinTr - INFO - args.ckpts : None
2022-11-04 04:37:20,521 - PoinTr - INFO - args.val_freq : 1
2022-11-04 04:37:20,521 - PoinTr - INFO - args.resume : False
2022-11-04 04:37:20,521 - PoinTr - INFO - args.test : False
2022-11-04 04:37:20,521 - PoinTr - INFO - args.mode : None
2022-11-04 04:37:20,521 - PoinTr - INFO - args.experiment_path : ./experiments/PoinTr/KITTI_models/example
2022-11-04 04:37:20,521 - PoinTr - INFO - args.tfboard_path : ./experiments/PoinTr/KITTI_models/TFBoard/example
2022-11-04 04:37:20,521 - PoinTr - INFO - args.log_name : PoinTr
2022-11-04 04:37:20,521 - PoinTr - INFO - args.use_gpu : True
2022-11-04 04:37:20,521 - PoinTr - INFO - args.distributed : False
2022-11-04 04:37:20,521 - PoinTr - INFO - config.optimizer = edict()
2022-11-04 04:37:20,521 - PoinTr - INFO - config.optimizer.type : AdamW
2022-11-04 04:37:20,521 - PoinTr - INFO - config.optimizer.kwargs = edict()
2022-11-04 04:37:20,521 - PoinTr - INFO - config.optimizer.kwargs.lr : 0.0001
2022-11-04 04:37:20,522 - PoinTr - INFO - config.optimizer.kwargs.weight_decay : 0.0005
2022-11-04 04:37:20,522 - PoinTr - INFO - config.scheduler = edict()
2022-11-04 04:37:20,522 - PoinTr - INFO - config.scheduler.type : LambdaLR
2022-11-04 04:37:20,522 - PoinTr - INFO - config.scheduler.kwargs = edict()
2022-11-04 04:37:20,522 - PoinTr - INFO - config.scheduler.kwargs.decay_step : 21
2022-11-04 04:37:20,522 - PoinTr - INFO - config.scheduler.kwargs.lr_decay : 0.9
2022-11-04 04:37:20,522 - PoinTr - INFO - config.scheduler.kwargs.lowest_decay : 0.02
2022-11-04 04:37:20,522 - PoinTr - INFO - config.bnmscheduler = edict()
2022-11-04 04:37:20,522 - PoinTr - INFO - config.bnmscheduler.type : Lambda
2022-11-04 04:37:20,522 - PoinTr - INFO - config.bnmscheduler.kwargs = edict()
2022-11-04 04:37:20,522 - PoinTr - INFO - config.bnmscheduler.kwargs.decay_step : 21
2022-11-04 04:37:20,522 - PoinTr - INFO - config.bnmscheduler.kwargs.bn_decay : 0.5
2022-11-04 04:37:20,522 - PoinTr - INFO - config.bnmscheduler.kwargs.bn_momentum : 0.9
2022-11-04 04:37:20,522 - PoinTr - INFO - config.bnmscheduler.kwargs.lowest_decay : 0.01
2022-11-04 04:37:20,522 - PoinTr - INFO - config.dataset = edict()
2022-11-04 04:37:20,522 - PoinTr - INFO - config.dataset.train = edict()
2022-11-04 04:37:20,522 - PoinTr - INFO - config.dataset.train.base = edict()
2022-11-04 04:37:20,522 - PoinTr - INFO - config.dataset.train.base.NAME : PCN
2022-11-04 04:37:20,522 - PoinTr - INFO - config.dataset.train.base.CATEGORY_FILE_PATH : data/PCN/PCN.json
2022-11-04 04:37:20,523 - PoinTr - INFO - config.dataset.train.base.N_POINTS : 16384
2022-11-04 04:37:20,523 - PoinTr - INFO - config.dataset.train.base.N_RENDERINGS : 8
2022-11-04 04:37:20,523 - PoinTr - INFO - config.dataset.train.base.PARTIAL_POINTS_PATH : data/PCN/%s/partial/%s/%s/%02d.pcd
2022-11-04 04:37:20,523 - PoinTr - INFO - config.dataset.train.base.COMPLETE_POINTS_PATH : data/PCN/%s/complete/%s/%s.pcd
2022-11-04 04:37:20,523 - PoinTr - INFO - config.dataset.train.base.CARS : True
2022-11-04 04:37:20,523 - PoinTr - INFO - config.dataset.train.others = edict()
2022-11-04 04:37:20,523 - PoinTr - INFO - config.dataset.train.others.subset : train
2022-11-04 04:37:20,523 - PoinTr - INFO - config.dataset.train.others.bs : 64
2022-11-04 04:37:20,523 - PoinTr - INFO - config.dataset.val = edict()
2022-11-04 04:37:20,523 - PoinTr - INFO - config.dataset.val.base = edict()
2022-11-04 04:37:20,523 - PoinTr - INFO - config.dataset.val.base.NAME : PCN
2022-11-04 04:37:20,523 - PoinTr - INFO - config.dataset.val.base.CATEGORY_FILE_PATH : data/PCN/PCN.json
2022-11-04 04:37:20,523 - PoinTr - INFO - config.dataset.val.base.N_POINTS : 16384
2022-11-04 04:37:20,523 - PoinTr - INFO - config.dataset.val.base.N_RENDERINGS : 8
2022-11-04 04:37:20,523 - PoinTr - INFO - config.dataset.val.base.PARTIAL_POINTS_PATH : data/PCN/%s/partial/%s/%s/%02d.pcd
2022-11-04 04:37:20,523 - PoinTr - INFO - config.dataset.val.base.COMPLETE_POINTS_PATH : data/PCN/%s/complete/%s/%s.pcd
2022-11-04 04:37:20,523 - PoinTr - INFO - config.dataset.val.base.CARS : True
2022-11-04 04:37:20,523 - PoinTr - INFO - config.dataset.val.others = edict()
2022-11-04 04:37:20,523 - PoinTr - INFO - config.dataset.val.others.subset : test
2022-11-04 04:37:20,524 - PoinTr - INFO - config.dataset.test = edict()
2022-11-04 04:37:20,524 - PoinTr - INFO - config.dataset.test.base = edict()
2022-11-04 04:37:20,524 - PoinTr - INFO - config.dataset.test.base.NAME : KITTI
2022-11-04 04:37:20,524 - PoinTr - INFO - config.dataset.test.base.CATEGORY_FILE_PATH : data/KITTI/KITTI.json
2022-11-04 04:37:20,524 - PoinTr - INFO - config.dataset.test.base.N_POINTS : 16384
2022-11-04 04:37:20,524 - PoinTr - INFO - config.dataset.test.base.CLOUD_PATH : data/KITTI/cars/%s.pcd
2022-11-04 04:37:20,524 - PoinTr - INFO - config.dataset.test.base.BBOX_PATH : data/KITTI/bboxes/%s.txt
2022-11-04 04:37:20,524 - PoinTr - INFO - config.dataset.test.others = edict()
2022-11-04 04:37:20,524 - PoinTr - INFO - config.dataset.test.others.subset : test
2022-11-04 04:37:20,524 - PoinTr - INFO - config.model = edict()
2022-11-04 04:37:20,524 - PoinTr - INFO - config.model.NAME : PoinTr
2022-11-04 04:37:20,524 - PoinTr - INFO - config.model.num_pred : 14336
2022-11-04 04:37:20,524 - PoinTr - INFO - config.model.num_query : 224
2022-11-04 04:37:20,524 - PoinTr - INFO - config.model.knn_layer : 1
2022-11-04 04:37:20,524 - PoinTr - INFO - config.model.trans_dim : 384
2022-11-04 04:37:20,524 - PoinTr - INFO - config.total_bs : 64
2022-11-04 04:37:20,525 - PoinTr - INFO - config.step_per_update : 1
2022-11-04 04:37:20,525 - PoinTr - INFO - config.max_epoch : 600
2022-11-04 04:37:20,525 - PoinTr - INFO - config.consider_metric : CDL1
2022-11-04 04:37:20,525 - PoinTr - INFO - Distributed training: False
2022-11-04 04:37:20,525 - PoinTr - INFO - Set random seed to 0, deterministic: False
2022-11-04 04:37:20,534 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=02958343, Name=car]
2022-11-04 04:37:20,563 - PCNDATASET - INFO - Complete collecting files of the dataset. Total files: 5677
/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py:477: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
cpuset_checked))
2022-11-04 04:37:20,570 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=02958343, Name=car]
2022-11-04 04:37:20,570 - PCNDATASET - INFO - Complete collecting files of the dataset. Total files: 150
2022-11-04 04:37:20,571 - MODEL - INFO - Transformer with knn_layer 1
2022-11-04 04:37:31,629 - PoinTr - INFO - Using Data parallel ...
2022-11-04 04:37:35,690 - PoinTr - INFO - padding while KITTI training
Traceback (most recent call last):
File "main.py", line 68, in
main()
File "main.py", line 64, in main
run_net(args, config, train_writer, val_writer)
File "/content/pointr/tools/runner.py", line 98, in run_net
ret = base_model(partial)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/parallel/data_parallel.py", line 165, in forward
return self.module(*inputs[0], **kwargs[0])
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/content/pointr/models/PoinTr.py", line 92, in forward
q, coarse_point_cloud = self.base_model(xyz) # B M C and B M 3
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/content/pointr/models/Transformer.py", line 353, in forward
coor, f = self.grouper(inpc.transpose(1,2).contiguous())
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/content/pointr/models/dgcnn_group.py", line 87, in forward
f = self.get_graph_feature(coor, f, coor, f)
File "/content/pointr/models/dgcnn_group.py", line 67, in get_graph_feature
assert idx.shape[1] == k
AssertionError

yuxumin · 2022-11-04T04:58:19Z

Hi,
Have you modified the code? this error comes from the unexpected k value of knn operation.

jackie174 · 2022-11-04T17:52:35Z

Hello, thanks so much for your reply.
I do not modify the code.
One thing I found is wired.
When I use bash ./scripts/train.sh 0 \ --config ./cfgs/KITTI_models/PoinTr.yaml \ --exp_name example
I first get an error is ./data/PCN/train is not found, Then I download them,
After that, I get the error
assert idx. shape[1] == k
AssertionError
Then I print idx.shape[1] result is 3

During the above process, I even do not download the KITTI dataset. Why did the code require to use PCN?

yuxumin · 2022-11-04T18:00:33Z

alright, i got the problem.
This is probably due to the version of knn_cuda . It does return idx (shape B 3 k, but expected to be B k 3).
so please transpose the idx before this line and the problem will go away

jackie174 · 2022-11-13T00:57:41Z

Hi, I try transposing data, but I get more errors.
First, I get view error:

  File "/content/pointr/models/dgcnn_group.py", line 73, in get_graph_feature
    idx = idx.view(-1)
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

Then I change it to reshape, and get the error below:
I am so confused, I did not change any code instead of transposing and reshaping.

            _, idx = knn(coor_k, coor_q)  # bs k np
            print("------------Before transpose\n", type(idx), idx)
            idx= torch.transpose(idx, 0, 1)
            print("-------------After transpose\n", type(idx), idx)
            assert idx.shape[1] == k
            idx_base = torch.arange(0, batch_size, device=x_q.device).view(-1, 1, 1) * num_points_k
            idx = idx + idx_base
            idx = idx.reshape(-1)

The error I got:


+ GPUS=0
+ PY_ARGS='--ckpts ./pretrained/PoinTr_PCN.pth --config ./cfgs/PCN_models/PoinTr.yaml --exp_name example'
+ CUDA_VISIBLE_DEVICES=0
+ python main.py --test --ckpts ./pretrained/PoinTr_PCN.pth --config ./cfgs/PCN_models/PoinTr.yaml --exp_name example
Create experiment path successfully at ./experiments/PoinTr/PCN_models/test_example
Create TFBoard path successfully at ./experiments/PoinTr/PCN_models/TFBoard/test_example
2022-11-13 00:47:49,404 - PoinTr - INFO - Copy the Config file from ./cfgs/PCN_models/PoinTr.yaml to ./experiments/PoinTr/PCN_models/test_example/config.yaml
2022-11-13 00:47:49,404 - PoinTr - INFO - args.config : ./cfgs/PCN_models/PoinTr.yaml
2022-11-13 00:47:49,404 - PoinTr - INFO - args.launcher : none
2022-11-13 00:47:49,404 - PoinTr - INFO - args.local_rank : 0
2022-11-13 00:47:49,404 - PoinTr - INFO - args.num_workers : 4
2022-11-13 00:47:49,404 - PoinTr - INFO - args.seed : 0
2022-11-13 00:47:49,404 - PoinTr - INFO - args.deterministic : False
2022-11-13 00:47:49,404 - PoinTr - INFO - args.sync_bn : False
2022-11-13 00:47:49,404 - PoinTr - INFO - args.exp_name : test_example
2022-11-13 00:47:49,404 - PoinTr - INFO - args.start_ckpts : None
2022-11-13 00:47:49,405 - PoinTr - INFO - args.ckpts : ./pretrained/PoinTr_PCN.pth
2022-11-13 00:47:49,405 - PoinTr - INFO - args.val_freq : 1
2022-11-13 00:47:49,405 - PoinTr - INFO - args.resume : False
2022-11-13 00:47:49,405 - PoinTr - INFO - args.test : True
2022-11-13 00:47:49,405 - PoinTr - INFO - args.mode : None
2022-11-13 00:47:49,405 - PoinTr - INFO - args.experiment_path : ./experiments/PoinTr/PCN_models/test_example
2022-11-13 00:47:49,405 - PoinTr - INFO - args.tfboard_path : ./experiments/PoinTr/PCN_models/TFBoard/test_example
2022-11-13 00:47:49,405 - PoinTr - INFO - args.log_name : PoinTr
2022-11-13 00:47:49,405 - PoinTr - INFO - args.use_gpu : True
2022-11-13 00:47:49,405 - PoinTr - INFO - args.distributed : False
2022-11-13 00:47:49,405 - PoinTr - INFO - config.optimizer = edict()
2022-11-13 00:47:49,405 - PoinTr - INFO - config.optimizer.type : AdamW
2022-11-13 00:47:49,405 - PoinTr - INFO - config.optimizer.kwargs = edict()
2022-11-13 00:47:49,405 - PoinTr - INFO - config.optimizer.kwargs.lr : 0.0005
2022-11-13 00:47:49,405 - PoinTr - INFO - config.optimizer.kwargs.weight_decay : 0.0005
2022-11-13 00:47:49,405 - PoinTr - INFO - config.scheduler = edict()
2022-11-13 00:47:49,405 - PoinTr - INFO - config.scheduler.type : LambdaLR
2022-11-13 00:47:49,406 - PoinTr - INFO - config.scheduler.kwargs = edict()
2022-11-13 00:47:49,406 - PoinTr - INFO - config.scheduler.kwargs.decay_step : 21
2022-11-13 00:47:49,406 - PoinTr - INFO - config.scheduler.kwargs.lr_decay : 0.9
2022-11-13 00:47:49,406 - PoinTr - INFO - config.scheduler.kwargs.lowest_decay : 0.02
2022-11-13 00:47:49,406 - PoinTr - INFO - config.bnmscheduler = edict()
2022-11-13 00:47:49,406 - PoinTr - INFO - config.bnmscheduler.type : Lambda
2022-11-13 00:47:49,406 - PoinTr - INFO - config.bnmscheduler.kwargs = edict()
2022-11-13 00:47:49,406 - PoinTr - INFO - config.bnmscheduler.kwargs.decay_step : 21
2022-11-13 00:47:49,406 - PoinTr - INFO - config.bnmscheduler.kwargs.bn_decay : 0.5
2022-11-13 00:47:49,406 - PoinTr - INFO - config.bnmscheduler.kwargs.bn_momentum : 0.9
2022-11-13 00:47:49,406 - PoinTr - INFO - config.bnmscheduler.kwargs.lowest_decay : 0.01
2022-11-13 00:47:49,406 - PoinTr - INFO - config.dataset = edict()
2022-11-13 00:47:49,406 - PoinTr - INFO - config.dataset.train = edict()
2022-11-13 00:47:49,406 - PoinTr - INFO - config.dataset.train._base_ = edict()
2022-11-13 00:47:49,406 - PoinTr - INFO - config.dataset.train._base_.NAME : PCN
2022-11-13 00:47:49,406 - PoinTr - INFO - config.dataset.train._base_.CATEGORY_FILE_PATH : data/PCN/PCN.json
2022-11-13 00:47:49,406 - PoinTr - INFO - config.dataset.train._base_.N_POINTS : 16384
2022-11-13 00:47:49,406 - PoinTr - INFO - config.dataset.train._base_.N_RENDERINGS : 8
2022-11-13 00:47:49,407 - PoinTr - INFO - config.dataset.train._base_.PARTIAL_POINTS_PATH : data/PCN/%s/partial/%s/%s/%02d.pcd
2022-11-13 00:47:49,407 - PoinTr - INFO - config.dataset.train._base_.COMPLETE_POINTS_PATH : data/PCN/%s/complete/%s/%s.pcd
2022-11-13 00:47:49,407 - PoinTr - INFO - config.dataset.train._base_.CARS : False
2022-11-13 00:47:49,407 - PoinTr - INFO - config.dataset.train.others = edict()
2022-11-13 00:47:49,407 - PoinTr - INFO - config.dataset.train.others.subset : train
2022-11-13 00:47:49,407 - PoinTr - INFO - config.dataset.train.others.bs : 48
2022-11-13 00:47:49,407 - PoinTr - INFO - config.dataset.val = edict()
2022-11-13 00:47:49,407 - PoinTr - INFO - config.dataset.val._base_ = edict()
2022-11-13 00:47:49,407 - PoinTr - INFO - config.dataset.val._base_.NAME : PCN
2022-11-13 00:47:49,407 - PoinTr - INFO - config.dataset.val._base_.CATEGORY_FILE_PATH : data/PCN/PCN.json
2022-11-13 00:47:49,407 - PoinTr - INFO - config.dataset.val._base_.N_POINTS : 16384
2022-11-13 00:47:49,407 - PoinTr - INFO - config.dataset.val._base_.N_RENDERINGS : 8
2022-11-13 00:47:49,407 - PoinTr - INFO - config.dataset.val._base_.PARTIAL_POINTS_PATH : data/PCN/%s/partial/%s/%s/%02d.pcd
2022-11-13 00:47:49,407 - PoinTr - INFO - config.dataset.val._base_.COMPLETE_POINTS_PATH : data/PCN/%s/complete/%s/%s.pcd
2022-11-13 00:47:49,407 - PoinTr - INFO - config.dataset.val._base_.CARS : False
2022-11-13 00:47:49,407 - PoinTr - INFO - config.dataset.val.others = edict()
2022-11-13 00:47:49,407 - PoinTr - INFO - config.dataset.val.others.subset : test
2022-11-13 00:47:49,407 - PoinTr - INFO - config.dataset.test = edict()
2022-11-13 00:47:49,407 - PoinTr - INFO - config.dataset.test._base_ = edict()
2022-11-13 00:47:49,408 - PoinTr - INFO - config.dataset.test._base_.NAME : PCN
2022-11-13 00:47:49,408 - PoinTr - INFO - config.dataset.test._base_.CATEGORY_FILE_PATH : data/PCN/PCN.json
2022-11-13 00:47:49,408 - PoinTr - INFO - config.dataset.test._base_.N_POINTS : 16384
2022-11-13 00:47:49,408 - PoinTr - INFO - config.dataset.test._base_.N_RENDERINGS : 8
2022-11-13 00:47:49,408 - PoinTr - INFO - config.dataset.test._base_.PARTIAL_POINTS_PATH : data/PCN/%s/partial/%s/%s/%02d.pcd
2022-11-13 00:47:49,408 - PoinTr - INFO - config.dataset.test._base_.COMPLETE_POINTS_PATH : data/PCN/%s/complete/%s/%s.pcd
2022-11-13 00:47:49,408 - PoinTr - INFO - config.dataset.test._base_.CARS : False
2022-11-13 00:47:49,408 - PoinTr - INFO - config.dataset.test.others = edict()
2022-11-13 00:47:49,408 - PoinTr - INFO - config.dataset.test.others.subset : test
2022-11-13 00:47:49,408 - PoinTr - INFO - config.model = edict()
2022-11-13 00:47:49,408 - PoinTr - INFO - config.model.NAME : PoinTr
2022-11-13 00:47:49,408 - PoinTr - INFO - config.model.num_pred : 14336
2022-11-13 00:47:49,408 - PoinTr - INFO - config.model.num_query : 224
2022-11-13 00:47:49,408 - PoinTr - INFO - config.model.knn_layer : 1
2022-11-13 00:47:49,408 - PoinTr - INFO - config.model.trans_dim : 384
2022-11-13 00:47:49,408 - PoinTr - INFO - config.total_bs : 48
2022-11-13 00:47:49,408 - PoinTr - INFO - config.step_per_update : 1
2022-11-13 00:47:49,408 - PoinTr - INFO - config.max_epoch : 300
2022-11-13 00:47:49,408 - PoinTr - INFO - config.consider_metric : CDL1
2022-11-13 00:47:49,409 - PoinTr - INFO - Distributed training: False
2022-11-13 00:47:49,409 - PoinTr - INFO - Set random seed to 0, deterministic: False
2022-11-13 00:47:49,409 - PoinTr - INFO - Tester start ... 
2022-11-13 00:47:49,416 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=02691156, Name=airplane]
2022-11-13 00:47:49,417 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=02933112, Name=cabinet]
2022-11-13 00:47:49,417 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=02958343, Name=car]
2022-11-13 00:47:49,418 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=03001627, Name=chair]
2022-11-13 00:47:49,418 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=03636649, Name=lamp]
2022-11-13 00:47:49,420 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=04256520, Name=sofa]
2022-11-13 00:47:49,420 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=04379243, Name=table]
2022-11-13 00:47:49,421 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=04530566, Name=watercraft]
2022-11-13 00:47:49,421 - PCNDATASET - INFO - Complete collecting files of the dataset. Total files: 1200
/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py:566: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
  cpuset_checked))
2022-11-13 00:47:49,423 - MODEL - INFO -  Transformer with knn_layer 1
2022-11-13 00:47:51,820 - PoinTr - INFO - Loading weights from ./pretrained/PoinTr_PCN.pth...
2022-11-13 00:47:54,342 - PoinTr - INFO - ckpts @ 289 epoch( performance = No Metrics)
------------Before transpose
 <class 'torch.Tensor'> tensor([[ 0,  1,  2],
        [ 3,  3,  3],
        [ 4,  4,  4],
        [ 5,  5,  5],
        [ 6,  6,  6],
        [ 7,  7,  7],
        [ 8,  8,  8],
        [ 9,  9,  9],
        [10, 10, 10],
        [11, 11, 11],
        [12, 12, 12],
        [13, 13, 13],
        [14, 14, 14],
        [15, 15, 15],
        [ 2,  2,  1],
        [ 1,  0,  0]], device='cuda:0')
-------------After transpose
 <class 'torch.Tensor'> tensor([[ 0,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15,  2,  1],
        [ 1,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15,  2,  0],
        [ 2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15,  1,  0]],
       device='cuda:0')
Traceback (most recent call last):
  File "main.py", line 68, in <module>
    main()
  File "main.py", line 62, in main
    test_net(args, config)
  File "/content/pointr/tools/runner.py", line 304, in test_net
    test(base_model, test_dataloader, ChamferDisL1, ChamferDisL2, args, config, logger=logger)
  File "/content/pointr/tools/runner.py", line 326, in test
    ret = base_model(partial)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/content/pointr/models/PoinTr.py", line 92, in forward
    q, coarse_point_cloud = self.base_model(xyz) # B M C and B M 3
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/content/pointr/models/Transformer.py", line 353, in forward
    coor, f = self.grouper(inpc.transpose(1,2).contiguous()) 
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/content/pointr/models/dgcnn_group.py", line 90, in forward
    f = self.get_graph_feature(coor, f, coor, f)
  File "/content/pointr/models/dgcnn_group.py", line 77, in get_graph_feature
    feature = feature.view(batch_size, k, num_points_q, num_dims).permute(0, 3, 2, 1).contiguous()
RuntimeError: shape '[1, 16, 2048, 8]' is invalid for input of size 384

yuxumin · 2022-11-13T05:17:51Z

_, idx = knn(coor_k, coor_q) # bs k np

can you show me the shape of coor_k and coor_q?
It seems that the shape of idx in your situation is (k, 3)? but it should be a 3-dims vector.

jackie174 · 2022-11-14T18:36:58Z

Thanks so much for your reply!
The below is what I get when I do evaluation:

!bash ./scripts/test.sh 0 \
    --ckpts ./pretrained/PoinTr_PCN.pth \
    --config ./cfgs/PCN_models/PoinTr.yaml \
    --exp_name example

Then I get following:

shpae of coor_k:  torch.Size([1, 3, 2048])
shpae of coor_q:  torch.Size([1, 3, 2048])
idx before transpose: tensor([[ 0,  1,  2],
        [ 3,  3,  3],
        [ 4,  4,  4],
        [ 5,  5,  5],
        [ 6,  6,  6],
        [ 7,  7,  7],
        [ 8,  8,  8],
        [ 9,  9,  9],
        [10, 10, 10],
        [11, 11, 11],
        [12, 12, 12],
        [13, 13, 13],
        [14, 14, 14],
        [15, 15, 15],
        [ 2,  2,  1],
        [ 1,  0,  0]], device='cuda:0')
idx after transpose: tensor([[ 0,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15,  2,  1],
        [ 1,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15,  2,  0],
        [ 2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15,  1,  0]],
       device='cuda:0')

The below is what I do in tranning:

bash ./scripts/train.sh 0 \
    --config ./cfgs/KITTI_models/PoinTr.yaml \
    --exp_name example

The below I get:

shpae of coor_k:  torch.Size([64, 3, 2048])
shpae of coor_q:  torch.Size([64, 3, 2048])
idx before transpose: tensor([[ 0,  1,  2],
        [15, 10,  4],
        [11, 14,  8],
        [14,  5, 13],
        [10, 15, 10],
        [13,  6, 15],
        [ 4, 11, 11],
        [ 6, 13,  6],
        [ 5,  3, 12],
        [ 7,  4,  5],
        [ 3,  7, 14],
        [12, 12,  3],
        [ 9,  9,  7],
        [ 8,  8,  9],
        [ 1,  2,  1],
        [ 2,  0,  0]], device='cuda:0')
idx after transpose: tensor([[ 0, 15, 11, 14, 10, 13,  4,  6,  5,  7,  3, 12,  9,  8,  1,  2],
        [ 1, 10, 14,  5, 15,  6, 11, 13,  3,  4,  7, 12,  9,  8,  2,  0],
        [ 2,  4,  8, 13, 10, 15, 11,  6, 12,  5, 14,  3,  7,  9,  1,  0]],
       device='cuda:0')

jackie174 · 2022-11-19T07:49:44Z

One thing is weird,
For PCN_models, ShapeNet34_models, and ShapeNet55_models, they can work on PCN.yaml.
For GRNet.yaml, it will output a RuntimeError: CUDA out of memory.
However, they are both not working on PoinTr.yaml.
I alway get assert idx.shape[1] == k
I do not modify code.

These are works ;

!bash ./scripts/train.sh 0 \
    --config ./cfgs/PCN_models/PCN.yaml \
    --exp_name example

!bash ./scripts/train.sh 0 \
    --config ./cfgs/ShapeNet55_models/PCN.yaml \
    --exp_name example

These are not works and give error : RuntimeError: CUDA out of memory.

!bash ./scripts/train.sh 0 \
    --config ./cfgs/PCN_models/GRNet.yaml \
    --exp_name example

These are not works and give error : AssertionErro: rassert idx.shape[1] == k
I try PCN_models, ShapeNet34_models, and ShapeNet55_models, not works

!bash ./scripts/train.sh 0 \
    --config ./cfgs/ShapeNet55_models/PoinTr.yaml \
    --exp_name example

yuxumin · 2022-11-19T07:56:58Z

hi, the problem comes from knn_cuda used in your environment.
Can you provide your env by running conda env list.
And can you share with me your models/dgcnn_group.py ?

jackie174 · 2022-11-20T22:39:33Z

Thank u very much for your reply.
My environment is :
cuda: 11.2
pytorch:1.13.0+cu117
python: 3.7
gcc： 7.5

conda env list

> # conda environments:
> #
> base                     /usr/local

 import torch
print(torch.__version__)
nvcc --version
gcc -v

> 1.13.0+cu117
> nvcc: NVIDIA (R) Cuda compiler driver
> Copyright (c) 2005-2021 NVIDIA Corporation
> Built on Sun_Feb_14_21:12:58_PST_2021
> Cuda compilation tools, release 11.2, V11.2.152
> Build cuda_11.2.r11.2/compiler.29618528_0
> Using built-in specs.
> COLLECT_GCC=gcc
> COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/7/lto-wrapper
> OFFLOAD_TARGET_NAMES=nvptx-none
> OFFLOAD_TARGET_DEFAULT=1
> Target: x86_64-linux-gnu
> Configured with: ../src/configure -v --with-pkgversion='Ubuntu 7.5.0-3ubuntu1~18.04' --with-bugurl=file:https:///usr/share/doc/gcc-7/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-7 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
> Thread model: posix
> gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)

The code I just fork from your repo.
https://github.com/Cmput-414/PoinTr/blob/master/models/dgcnn_group.py
You also can take a look at what I write in the colab.
https://github.com/Cmput-414/pointr-colab/blob/main/PoinTr.ipynb

yuxumin · 2022-11-21T06:20:42Z

Hi, i update the code for kNN calculation in dgcnn_group.py, could you try again and update the results here?

If the error still exists, please debug and let me know the shape of input and output for kNN. (coor_k, coor_q, idx)

Best!

jackie174 · 2022-11-21T07:15:05Z

HI, I get this after running:
shape of coor_q: torch.Size([48, 3, 2048])
shape of coor_k: torch.Size([48, 3, 2048])
shape of idx:
Before transpose: torch.Size([48, 2048, 16])
After transpose: torch.Size([48, 16, 2048])
after veiw: torch.Size([1572864])

/content/pointr
+ GPUS=0
+ PY_ARGS='--config ./cfgs/PCN_models/PoinTr.yaml --exp_name example'
+ CUDA_VISIBLE_DEVICES=0
+ python main.py --config ./cfgs/PCN_models/PoinTr.yaml --exp_name example
2022-11-21 07:18:16,859 - PoinTr - INFO - Copy the Config file from ./cfgs/PCN_models/PoinTr.yaml to ./experiments/PoinTr/PCN_models/example/config.yaml
2022-11-21 07:18:16,860 - PoinTr - INFO - args.config : ./cfgs/PCN_models/PoinTr.yaml
2022-11-21 07:18:16,860 - PoinTr - INFO - args.launcher : none
2022-11-21 07:18:16,860 - PoinTr - INFO - args.local_rank : 0
2022-11-21 07:18:16,860 - PoinTr - INFO - args.num_workers : 4
2022-11-21 07:18:16,860 - PoinTr - INFO - args.seed : 0
2022-11-21 07:18:16,860 - PoinTr - INFO - args.deterministic : False
2022-11-21 07:18:16,860 - PoinTr - INFO - args.sync_bn : False
2022-11-21 07:18:16,860 - PoinTr - INFO - args.exp_name : example
2022-11-21 07:18:16,860 - PoinTr - INFO - args.start_ckpts : None
2022-11-21 07:18:16,860 - PoinTr - INFO - args.ckpts : None
2022-11-21 07:18:16,860 - PoinTr - INFO - args.val_freq : 1
2022-11-21 07:18:16,860 - PoinTr - INFO - args.resume : False
2022-11-21 07:18:16,860 - PoinTr - INFO - args.test : False
2022-11-21 07:18:16,860 - PoinTr - INFO - args.mode : None
2022-11-21 07:18:16,860 - PoinTr - INFO - args.experiment_path : ./experiments/PoinTr/PCN_models/example
2022-11-21 07:18:16,860 - PoinTr - INFO - args.tfboard_path : ./experiments/PoinTr/PCN_models/TFBoard/example
2022-11-21 07:18:16,860 - PoinTr - INFO - args.log_name : PoinTr
2022-11-21 07:18:16,860 - PoinTr - INFO - args.use_gpu : True
2022-11-21 07:18:16,860 - PoinTr - INFO - args.distributed : False
2022-11-21 07:18:16,860 - PoinTr - INFO - config.optimizer = edict()
2022-11-21 07:18:16,860 - PoinTr - INFO - config.optimizer.type : AdamW
2022-11-21 07:18:16,860 - PoinTr - INFO - config.optimizer.kwargs = edict()
2022-11-21 07:18:16,860 - PoinTr - INFO - config.optimizer.kwargs.lr : 0.0005
2022-11-21 07:18:16,860 - PoinTr - INFO - config.optimizer.kwargs.weight_decay : 0.0005
2022-11-21 07:18:16,860 - PoinTr - INFO - config.scheduler = edict()
2022-11-21 07:18:16,861 - PoinTr - INFO - config.scheduler.type : LambdaLR
2022-11-21 07:18:16,861 - PoinTr - INFO - config.scheduler.kwargs = edict()
2022-11-21 07:18:16,861 - PoinTr - INFO - config.scheduler.kwargs.decay_step : 21
2022-11-21 07:18:16,861 - PoinTr - INFO - config.scheduler.kwargs.lr_decay : 0.9
2022-11-21 07:18:16,861 - PoinTr - INFO - config.scheduler.kwargs.lowest_decay : 0.02
2022-11-21 07:18:16,861 - PoinTr - INFO - config.bnmscheduler = edict()
2022-11-21 07:18:16,861 - PoinTr - INFO - config.bnmscheduler.type : Lambda
2022-11-21 07:18:16,861 - PoinTr - INFO - config.bnmscheduler.kwargs = edict()
2022-11-21 07:18:16,861 - PoinTr - INFO - config.bnmscheduler.kwargs.decay_step : 21
2022-11-21 07:18:16,861 - PoinTr - INFO - config.bnmscheduler.kwargs.bn_decay : 0.5
2022-11-21 07:18:16,861 - PoinTr - INFO - config.bnmscheduler.kwargs.bn_momentum : 0.9
2022-11-21 07:18:16,861 - PoinTr - INFO - config.bnmscheduler.kwargs.lowest_decay : 0.01
2022-11-21 07:18:16,861 - PoinTr - INFO - config.dataset = edict()
2022-11-21 07:18:16,861 - PoinTr - INFO - config.dataset.train = edict()
2022-11-21 07:18:16,861 - PoinTr - INFO - config.dataset.train._base_ = edict()
2022-11-21 07:18:16,861 - PoinTr - INFO - config.dataset.train._base_.NAME : PCN
2022-11-21 07:18:16,861 - PoinTr - INFO - config.dataset.train._base_.CATEGORY_FILE_PATH : data/PCN/PCN.json
2022-11-21 07:18:16,861 - PoinTr - INFO - config.dataset.train._base_.N_POINTS : 16384
2022-11-21 07:18:16,861 - PoinTr - INFO - config.dataset.train._base_.N_RENDERINGS : 8
2022-11-21 07:18:16,861 - PoinTr - INFO - config.dataset.train._base_.PARTIAL_POINTS_PATH : data/PCN/%s/partial/%s/%s/%02d.pcd
2022-11-21 07:18:16,861 - PoinTr - INFO - config.dataset.train._base_.COMPLETE_POINTS_PATH : data/PCN/%s/complete/%s/%s.pcd
2022-11-21 07:18:16,861 - PoinTr - INFO - config.dataset.train._base_.CARS : False
2022-11-21 07:18:16,861 - PoinTr - INFO - config.dataset.train.others = edict()
2022-11-21 07:18:16,861 - PoinTr - INFO - config.dataset.train.others.subset : train
2022-11-21 07:18:16,861 - PoinTr - INFO - config.dataset.train.others.bs : 48
2022-11-21 07:18:16,861 - PoinTr - INFO - config.dataset.val = edict()
2022-11-21 07:18:16,861 - PoinTr - INFO - config.dataset.val._base_ = edict()
2022-11-21 07:18:16,861 - PoinTr - INFO - config.dataset.val._base_.NAME : PCN
2022-11-21 07:18:16,861 - PoinTr - INFO - config.dataset.val._base_.CATEGORY_FILE_PATH : data/PCN/PCN.json
2022-11-21 07:18:16,862 - PoinTr - INFO - config.dataset.val._base_.N_POINTS : 16384
2022-11-21 07:18:16,862 - PoinTr - INFO - config.dataset.val._base_.N_RENDERINGS : 8
2022-11-21 07:18:16,862 - PoinTr - INFO - config.dataset.val._base_.PARTIAL_POINTS_PATH : data/PCN/%s/partial/%s/%s/%02d.pcd
2022-11-21 07:18:16,862 - PoinTr - INFO - config.dataset.val._base_.COMPLETE_POINTS_PATH : data/PCN/%s/complete/%s/%s.pcd
2022-11-21 07:18:16,862 - PoinTr - INFO - config.dataset.val._base_.CARS : False
2022-11-21 07:18:16,862 - PoinTr - INFO - config.dataset.val.others = edict()
2022-11-21 07:18:16,862 - PoinTr - INFO - config.dataset.val.others.subset : test
2022-11-21 07:18:16,862 - PoinTr - INFO - config.dataset.test = edict()
2022-11-21 07:18:16,862 - PoinTr - INFO - config.dataset.test._base_ = edict()
2022-11-21 07:18:16,862 - PoinTr - INFO - config.dataset.test._base_.NAME : PCN
2022-11-21 07:18:16,862 - PoinTr - INFO - config.dataset.test._base_.CATEGORY_FILE_PATH : data/PCN/PCN.json
2022-11-21 07:18:16,862 - PoinTr - INFO - config.dataset.test._base_.N_POINTS : 16384
2022-11-21 07:18:16,862 - PoinTr - INFO - config.dataset.test._base_.N_RENDERINGS : 8
2022-11-21 07:18:16,862 - PoinTr - INFO - config.dataset.test._base_.PARTIAL_POINTS_PATH : data/PCN/%s/partial/%s/%s/%02d.pcd
2022-11-21 07:18:16,862 - PoinTr - INFO - config.dataset.test._base_.COMPLETE_POINTS_PATH : data/PCN/%s/complete/%s/%s.pcd
2022-11-21 07:18:16,862 - PoinTr - INFO - config.dataset.test._base_.CARS : False
2022-11-21 07:18:16,862 - PoinTr - INFO - config.dataset.test.others = edict()
2022-11-21 07:18:16,862 - PoinTr - INFO - config.dataset.test.others.subset : test
2022-11-21 07:18:16,862 - PoinTr - INFO - config.model = edict()
2022-11-21 07:18:16,862 - PoinTr - INFO - config.model.NAME : PoinTr
2022-11-21 07:18:16,862 - PoinTr - INFO - config.model.num_pred : 14336
2022-11-21 07:18:16,862 - PoinTr - INFO - config.model.num_query : 224
2022-11-21 07:18:16,862 - PoinTr - INFO - config.model.knn_layer : 1
2022-11-21 07:18:16,862 - PoinTr - INFO - config.model.trans_dim : 384
2022-11-21 07:18:16,862 - PoinTr - INFO - config.total_bs : 48
2022-11-21 07:18:16,862 - PoinTr - INFO - config.step_per_update : 1
2022-11-21 07:18:16,862 - PoinTr - INFO - config.max_epoch : 300
2022-11-21 07:18:16,862 - PoinTr - INFO - config.consider_metric : CDL1
2022-11-21 07:18:16,863 - PoinTr - INFO - Distributed training: False
2022-11-21 07:18:16,863 - PoinTr - INFO - Set random seed to 0, deterministic: False
2022-11-21 07:18:16,871 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=02691156, Name=airplane]
2022-11-21 07:18:16,978 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=02933112, Name=cabinet]
2022-11-21 07:18:16,985 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=02958343, Name=car]
2022-11-21 07:18:17,013 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=03001627, Name=chair]
2022-11-21 07:18:17,041 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=03636649, Name=lamp]
2022-11-21 07:18:17,051 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=04256520, Name=sofa]
2022-11-21 07:18:17,066 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=04379243, Name=table]
2022-11-21 07:18:17,096 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=04530566, Name=watercraft]
2022-11-21 07:18:17,105 - PCNDATASET - INFO - Complete collecting files of the dataset. Total files: 28974
2022-11-21 07:18:17,112 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=02691156, Name=airplane]
2022-11-21 07:18:17,113 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=02933112, Name=cabinet]
2022-11-21 07:18:17,113 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=02958343, Name=car]
2022-11-21 07:18:17,114 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=03001627, Name=chair]
2022-11-21 07:18:17,114 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=03636649, Name=lamp]
2022-11-21 07:18:17,114 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=04256520, Name=sofa]
2022-11-21 07:18:17,116 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=04379243, Name=table]
2022-11-21 07:18:17,116 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=04530566, Name=watercraft]
2022-11-21 07:18:17,116 - PCNDATASET - INFO - Complete collecting files of the dataset. Total files: 1200
2022-11-21 07:18:17,117 - MODEL - INFO -  Transformer with knn_layer 1
2022-11-21 07:18:19,483 - PoinTr - INFO - Using Data parallel ...
**************************************************
coor_k shape:  torch.Size([48, 3, 2048])
coor_q shape:  torch.Size([48, 3, 2048])
idx = knn_point(k, coor_k.transpose(-1, -2).contiguous(), coor_q.transpose(-1, -2).contiguous()) # B G M tensor([[[-5584463534953070592, -5620492331972034560, -5584463534944681984,
           ..., -5584463534936293376, -5584463534986625024,
          -5620492331955257344],
         [          3003121664, -5584463534969847808, -5620492331955257344,
           ..., -5584463534944681984, -5584463534953070592,
          -5620492331955257344],
         [-5764607520039501824, -5584463534936293376, -5584463534944681984,
           ..., -5584463534944681984, -5584463534944681984,
                             0],
         ...,
         [                1183,                 1184,                 1185,
           ...,                 1196,                 1197,
                          1198],
         [                1183,                 1184,                 1185,
           ...,                 1196,                 1197,
                          1198],
         [                1183,                 1184,                 1185,
           ...,                 1196,                 1197,
                          1198]],

        [[                   0,                    0,                    0,
           ...,                    0,                    0,
                             0],
         [                   1,                    0,                    0,
           ...,                    0,                    0,
                             0],
         [                   2,                    0,                    0,
           ...,                    0,                    0,
                             0],
         ...,
         [                 528,                  529,                  530,
           ...,                  541,                  542,
                           543],
         [                 528,                  529,                  530,
           ...,                  541,                  542,
                           543],
         [                 528,                  529,                  530,
           ...,                  541,                  542,
                           543]],

        [[                   0,                    0,                    0,
           ...,                    0,                    0,
                             0],
         [                   1,                    0,                    0,
           ...,                    0,                    0,
                             0],
         [                   2,                    0,                    0,
           ...,                    0,                    0,
                             0],
         ...,
         [                 713,                  714,                  715,
           ...,                  726,                  727,
                           728],
         [                 713,                  714,                  715,
           ...,                  726,                  727,
                           728],
         [                 713,                  714,                  715,
           ...,                  726,                  727,
                           728]],

        ...,

        [[                   0,                  413,                  429,
           ...,                 1358,                 1446,
                           807],
         [                   1,                    5,                   80,
           ...,                 1398,                 1445,
                           918],
         [                   2,                  201,                  404,
           ...,                 1346,                 1396,
                           515],
         ...,
         [                1461,                 1462,                 1463,
           ...,                 1474,                 1475,
                          1476],
         [                1461,                 1462,                 1463,
           ...,                 1474,                 1475,
                          1476],
         [                1461,                 1462,                 1463,
           ...,                 1474,                 1475,
                          1476]],

        [[                   0,                   30,                   37,
           ...,                  738,                  743,
                           534],
         [                   1,                   15,                  129,
           ...,                  504,                  532,
                           660],
         [                   2,                   76,                   83,
           ...,                  724,                  761,
                           135],
         ...,
         [                 798,                  799,                  800,
           ...,                  811,                  812,
                           813],
         [                 798,                  799,                  800,
           ...,                  811,                  812,
                           813],
         [                 798,                  799,                  800,
           ...,                  811,                  812,
                           813]],

        [[                   0,                   26,                  198,
           ...,                  491,                  494,
                           286],
         [                   1,                  115,                  186,
           ...,                  541,                  555,
                           452],
         [                   2,                   24,                  124,
           ...,                  555,                  569,
                           409],
         ...,
         [                 581,                  582,                  583,
           ...,                  594,                  595,
                           596],
         [                 581,                  582,                  583,
           ...,                  594,                  595,
                           596],
         [                 581,                  582,                  583,
           ...,                  594,                  595,
                           596]]], device='cuda:0')
idx:  tensor([[[-5584463534953070592, -5620492331972034560, -5584463534944681984,
           ..., -5584463534936293376, -5584463534986625024,
          -5620492331955257344],
         [          3003121664, -5584463534969847808, -5620492331955257344,
           ..., -5584463534944681984, -5584463534953070592,
          -5620492331955257344],
         [-5764607520039501824, -5584463534936293376, -5584463534944681984,
           ..., -5584463534944681984, -5584463534944681984,
                             0],
         ...,
         [                1183,                 1184,                 1185,
           ...,                 1196,                 1197,
                          1198],
         [                1183,                 1184,                 1185,
           ...,                 1196,                 1197,
                          1198],
         [                1183,                 1184,                 1185,
           ...,                 1196,                 1197,
                          1198]],

        [[                   0,                    0,                    0,
           ...,                    0,                    0,
                             0],
         [                   1,                    0,                    0,
           ...,                    0,                    0,
                             0],
         [                   2,                    0,                    0,
           ...,                    0,                    0,
                             0],
         ...,
         [                 528,                  529,                  530,
           ...,                  541,                  542,
                           543],
         [                 528,                  529,                  530,
           ...,                  541,                  542,
                           543],
         [                 528,                  529,                  530,
           ...,                  541,                  542,
                           543]],

        [[                   0,                    0,                    0,
           ...,                    0,                    0,
                             0],
         [                   1,                    0,                    0,
           ...,                    0,                    0,
                             0],
         [                   2,                    0,                    0,
           ...,                    0,                    0,
                             0],
         ...,
         [                 713,                  714,                  715,
           ...,                  726,                  727,
                           728],
         [                 713,                  714,                  715,
           ...,                  726,                  727,
                           728],
         [                 713,                  714,                  715,
           ...,                  726,                  727,
                           728]],

        ...,

        [[                   0,                  413,                  429,
           ...,                 1358,                 1446,
                           807],
         [                   1,                    5,                   80,
           ...,                 1398,                 1445,
                           918],
         [                   2,                  201,                  404,
           ...,                 1346,                 1396,
                           515],
         ...,
         [                1461,                 1462,                 1463,
           ...,                 1474,                 1475,
                          1476],
         [                1461,                 1462,                 1463,
           ...,                 1474,                 1475,
                          1476],
         [                1461,                 1462,                 1463,
           ...,                 1474,                 1475,
                          1476]],

        [[                   0,                   30,                   37,
           ...,                  738,                  743,
                           534],
         [                   1,                   15,                  129,
           ...,                  504,                  532,
                           660],
         [                   2,                   76,                   83,
           ...,                  724,                  761,
                           135],
         ...,
         [                 798,                  799,                  800,
           ...,                  811,                  812,
                           813],
         [                 798,                  799,                  800,
           ...,                  811,                  812,
                           813],
         [                 798,                  799,                  800,
           ...,                  811,                  812,
                           813]],

        [[                   0,                   26,                  198,
           ...,                  491,                  494,
                           286],
         [                   1,                  115,                  186,
           ...,                  541,                  555,
                           452],
         [                   2,                   24,                  124,
           ...,                  555,                  569,
                           409],
         ...,
         [                 581,                  582,                  583,
           ...,                  594,                  595,
                           596],
         [                 581,                  582,                  583,
           ...,                  594,                  595,
                           596],
         [                 581,                  582,                  583,
           ...,                  594,                  595,
                           596]]], device='cuda:0')
idx = idx.transpose(-1, -2).contiguous()
idx:  tensor([[[-5584463534953070592,           3003121664, -5764607520039501824,
           ...,                 1183,                 1183,
                          1183],
         [-5620492331972034560, -5584463534969847808, -5584463534936293376,
           ...,                 1184,                 1184,
                          1184],
         [-5584463534944681984, -5620492331955257344, -5584463534944681984,
           ...,                 1185,                 1185,
                          1185],
         ...,
         [-5584463534936293376, -5584463534944681984, -5584463534944681984,
           ...,                 1196,                 1196,
                          1196],
         [-5584463534986625024, -5584463534953070592, -5584463534944681984,
           ...,                 1197,                 1197,
                          1197],
         [-5620492331955257344, -5620492331955257344,                    0,
           ...,                 1198,                 1198,
                          1198]],

        [[                   0,                    1,                    2,
           ...,                  528,                  528,
                           528],
         [                   0,                    0,                    0,
           ...,                  529,                  529,
                           529],
         [                   0,                    0,                    0,
           ...,                  530,                  530,
                           530],
         ...,
         [                   0,                    0,                    0,
           ...,                  541,                  541,
                           541],
         [                   0,                    0,                    0,
           ...,                  542,                  542,
                           542],
         [                   0,                    0,                    0,
           ...,                  543,                  543,
                           543]],

        [[                   0,                    1,                    2,
           ...,                  713,                  713,
                           713],
         [                   0,                    0,                    0,
           ...,                  714,                  714,
                           714],
         [                   0,                    0,                    0,
           ...,                  715,                  715,
                           715],
         ...,
         [                   0,                    0,                    0,
           ...,                  726,                  726,
                           726],
         [                   0,                    0,                    0,
           ...,                  727,                  727,
                           727],
         [                   0,                    0,                    0,
           ...,                  728,                  728,
                           728]],

        ...,

        [[                   0,                    1,                    2,
           ...,                 1461,                 1461,
                          1461],
         [                 413,                    5,                  201,
           ...,                 1462,                 1462,
                          1462],
         [                 429,                   80,                  404,
           ...,                 1463,                 1463,
                          1463],
         ...,
         [                1358,                 1398,                 1346,
           ...,                 1474,                 1474,
                          1474],
         [                1446,                 1445,                 1396,
           ...,                 1475,                 1475,
                          1475],
         [                 807,                  918,                  515,
           ...,                 1476,                 1476,
                          1476]],

        [[                   0,                    1,                    2,
           ...,                  798,                  798,
                           798],
         [                  30,                   15,                   76,
           ...,                  799,                  799,
                           799],
         [                  37,                  129,                   83,
           ...,                  800,                  800,
                           800],
         ...,
         [                 738,                  504,                  724,
           ...,                  811,                  811,
                           811],
         [                 743,                  532,                  761,
           ...,                  812,                  812,
                           812],
         [                 534,                  660,                  135,
           ...,                  813,                  813,
                           813]],

        [[                   0,                    1,                    2,
           ...,                  581,                  581,
                           581],
         [                  26,                  115,                   24,
           ...,                  582,                  582,
                           582],
         [                 198,                  186,                  124,
           ...,                  583,                  583,
                           583],
         ...,
         [                 491,                  541,                  555,
           ...,                  594,                  594,
                           594],
         [                 494,                  555,                  569,
           ...,                  595,                  595,
                           595],
         [                 286,                  452,                  409,
           ...,                  596,                  596,
                           596]]], device='cuda:0')
idx_base:  tensor([[[    0]],

        [[ 2048]],

        [[ 4096]],

        [[ 6144]],

        [[ 8192]],

        [[10240]],

        [[12288]],

        [[14336]],

        [[16384]],

        [[18432]],

        [[20480]],

        [[22528]],

        [[24576]],

        [[26624]],

        [[28672]],

        [[30720]],

        [[32768]],

        [[34816]],

        [[36864]],

        [[38912]],

        [[40960]],

        [[43008]],

        [[45056]],

        [[47104]],

        [[49152]],

        [[51200]],

        [[53248]],

        [[55296]],

        [[57344]],

        [[59392]],

        [[61440]],

        [[63488]],

        [[65536]],

        [[67584]],

        [[69632]],

        [[71680]],

        [[73728]],

        [[75776]],

        [[77824]],

        [[79872]],

        [[81920]],

        [[83968]],

        [[86016]],

        [[88064]],

        [[90112]],

        [[92160]],

        [[94208]],

        [[96256]]], device='cuda:0')
idx = idx + idx_base
idx:  tensor([[[-5584463534953070592,           3003121664, -5764607520039501824,
           ...,                 1183,                 1183,
                          1183],
         [-5620492331972034560, -5584463534969847808, -5584463534936293376,
           ...,                 1184,                 1184,
                          1184],
         [-5584463534944681984, -5620492331955257344, -5584463534944681984,
           ...,                 1185,                 1185,
                          1185],
         ...,
         [-5584463534936293376, -5584463534944681984, -5584463534944681984,
           ...,                 1196,                 1196,
                          1196],
         [-5584463534986625024, -5584463534953070592, -5584463534944681984,
           ...,                 1197,                 1197,
                          1197],
         [-5620492331955257344, -5620492331955257344,                    0,
           ...,                 1198,                 1198,
                          1198]],

        [[                2048,                 2049,                 2050,
           ...,                 2576,                 2576,
                          2576],
         [                2048,                 2048,                 2048,
           ...,                 2577,                 2577,
                          2577],
         [                2048,                 2048,                 2048,
           ...,                 2578,                 2578,
                          2578],
         ...,
         [                2048,                 2048,                 2048,
           ...,                 2589,                 2589,
                          2589],
         [                2048,                 2048,                 2048,
           ...,                 2590,                 2590,
                          2590],
         [                2048,                 2048,                 2048,
           ...,                 2591,                 2591,
                          2591]],

        [[                4096,                 4097,                 4098,
           ...,                 4809,                 4809,
                          4809],
         [                4096,                 4096,                 4096,
           ...,                 4810,                 4810,
                          4810],
         [                4096,                 4096,                 4096,
           ...,                 4811,                 4811,
                          4811],
         ...,
         [                4096,                 4096,                 4096,
           ...,                 4822,                 4822,
                          4822],
         [                4096,                 4096,                 4096,
           ...,                 4823,                 4823,
                          4823],
         [                4096,                 4096,                 4096,
           ...,                 4824,                 4824,
                          4824]],

        ...,

        [[               92160,                92161,                92162,
           ...,                93621,                93621,
                         93621],
         [               92573,                92165,                92361,
           ...,                93622,                93622,
                         93622],
         [               92589,                92240,                92564,
           ...,                93623,                93623,
                         93623],
         ...,
         [               93518,                93558,                93506,
           ...,                93634,                93634,
                         93634],
         [               93606,                93605,                93556,
           ...,                93635,                93635,
                         93635],
         [               92967,                93078,                92675,
           ...,                93636,                93636,
                         93636]],

        [[               94208,                94209,                94210,
           ...,                95006,                95006,
                         95006],
         [               94238,                94223,                94284,
           ...,                95007,                95007,
                         95007],
         [               94245,                94337,                94291,
           ...,                95008,                95008,
                         95008],
         ...,
         [               94946,                94712,                94932,
           ...,                95019,                95019,
                         95019],
         [               94951,                94740,                94969,
           ...,                95020,                95020,
                         95020],
         [               94742,                94868,                94343,
           ...,                95021,                95021,
                         95021]],

        [[               96256,                96257,                96258,
           ...,                96837,                96837,
                         96837],
         [               96282,                96371,                96280,
           ...,                96838,                96838,
                         96838],
         [               96454,                96442,                96380,
           ...,                96839,                96839,
                         96839],
         ...,
         [               96747,                96797,                96811,
           ...,                96850,                96850,
                         96850],
         [               96750,                96811,                96825,
           ...,                96851,                96851,
                         96851],
         [               96542,                96708,                96665,
           ...,                96852,                96852,
                         96852]]], device='cuda:0')
idx = idx.view(-1)
idx:  tensor([-5584463534953070592,           3003121664, -5764607520039501824,
         ...,                96852,                96852,
                       96852], device='cuda:0')
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [3,0,0], thread: [96,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [3,0,0], thread: [97,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed
*****some repeat lines*****
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [97,0,0], thread: [111,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
Traceback (most recent call last):
  File "main.py", line 68, in <module>
    main()
  File "main.py", line 64, in main
    run_net(args, config, train_writer, val_writer)
  File "/content/pointr/tools/runner.py", line 98, in run_net
    ret = base_model(partial)
  File "/usr/local/envs/myenv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/envs/myenv/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 169, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/usr/local/envs/myenv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/content/pointr/models/PoinTr.py", line 92, in forward
    q, coarse_point_cloud = self.base_model(xyz) # B M C and B M 3
  File "/usr/local/envs/myenv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/content/pointr/models/Transformer.py", line 353, in forward
    coor, f = self.grouper(inpc.transpose(1,2).contiguous()) 
  File "/usr/local/envs/myenv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/content/pointr/models/dgcnn_group.py", line 137, in forward
    f = self.layer1(f)
  File "/usr/local/envs/myenv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/envs/myenv/lib/python3.7/site-packages/torch/nn/modules/container.py", line 204, in forward
    input = module(input)
  File "/usr/local/envs/myenv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/envs/myenv/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 463, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/usr/local/envs/myenv/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 460, in _conv_forward
    self.padding, self.dilation, self.groups)
RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR
You can try to repro this exception using the following code snippet. If that doesn't trigger the error, please include your original repro script when reporting this issue.

import torch
torch.backends.cuda.matmul.allow_tf32 = False
torch.backends.cudnn.benchmark = True
torch.backends.cudnn.deterministic = False
torch.backends.cudnn.allow_tf32 = True
data = torch.randn([48, 16, 2048, 16], dtype=torch.float, device='cuda', requires_grad=True)
net = torch.nn.Conv2d(16, 32, kernel_size=[1, 1], padding=[0, 0], stride=[1, 1], dilation=[1, 1], groups=1)
net = net.cuda().float()
out = net(data)
out.backward(torch.randn_like(out))
torch.cuda.synchronize()

ConvolutionParams 
    memory_format = Contiguous
    data_type = CUDNN_DATA_FLOAT
    padding = [0, 0, 0]
    stride = [1, 1, 0]
    dilation = [1, 1, 0]
    groups = 1
    deterministic = false
    allow_tf32 = true
input: TensorDescriptor 0x5593baac81c0
    type = CUDNN_DATA_FLOAT
    nbDims = 4
    dimA = 48, 16, 2048, 16, 
    strideA = 524288, 32768, 16, 1, 
output: TensorDescriptor 0x559435ebfd40
    type = CUDNN_DATA_FLOAT
    nbDims = 4
    dimA = 48, 32, 2048, 16, 
    strideA = 1048576, 32768, 16, 1, 
weight: FilterDescriptor 0x559433965ce0
    type = CUDNN_DATA_FLOAT
    tensor_format = CUDNN_TENSOR_NCHW
    nbDims = 4
    dimA = 32, 16, 1, 1, 
Pointer addresses: 
    input: 0x7f1e38c00000
    output: 0x7f1e3ec00000
    weight: 0x7f1ecf000600

Thank u so much!

yuxumin · 2022-11-21T08:05:17Z

Hi, it seems the last issue ( Unexpected shape of idx ) has gone.
The new error comes from the negative idx

(idx: tensor([[[-5584463534953070592, 3003121664, -5764607520039501824] ...)

However, i am not sure why this error occurred. idx is produced by torch.topk function. (https://github.com/yuxumin/PoinTr/blob/master/models/dgcnn_group.py#L17)
Could you provide more information?

jackie174 · 2022-11-21T08:34:28Z

sure, this is waht i have in colab:
https://colab.research.google.com/drive/1Utvtn0euJwu350eOctS8IQt1yTegU5PF#scrollTo=gTQpfhF7Lm43

yuxumin · 2022-11-21T08:43:17Z

Hi, a permission is required to visit this colab.

jackie174 · 2022-11-21T09:01:16Z

sorry
https://colab.research.google.com/drive/1Utvtn0euJwu350eOctS8IQt1yTegU5PF?usp=sharing
also, dataset is https://drive.google.com/drive/u/0/folders/0AMEcwIBVSoNtUk9PVA

jackie174 · 2022-11-21T10:43:57Z

Hi, I try use a local laptop to implement the code.
Looks assertion error is gone, but new problem!!!

bash ./scripts/train.sh 0 \
    --config ./cfgs/PCN_models/PoinTr.yaml \
    --exp_name example

s.deterministic : False
2022-11-21 03:37:27,785 - PoinTr - INFO - args.sync_bn : False
2022-11-21 03:37:27,786 - PoinTr - INFO - args.exp_name : example
2022-11-21 03:37:27,787 - PoinTr - INFO - args.start_ckpts : None
2022-11-21 03:37:27,788 - PoinTr - INFO - args.ckpts : None
2022-11-21 03:37:27,789 - PoinTr - INFO - args.val_freq : 1
2022-11-21 03:37:27,790 - PoinTr - INFO - args.resume : False
2022-11-21 03:37:27,790 - PoinTr - INFO - args.test : False
2022-11-21 03:37:27,792 - PoinTr - INFO - args.mode : None
2022-11-21 03:37:27,794 - PoinTr - INFO - args.experiment_path : ./experiments/PoinTr/KITTI_models/example
2022-11-21 03:37:27,795 - PoinTr - INFO - args.tfboard_path : ./experiments/PoinTr/KITTI_models/TFBoard/example
2022-11-21 03:37:27,796 - PoinTr - INFO - args.log_name : PoinTr
2022-11-21 03:37:27,796 - PoinTr - INFO - args.use_gpu : True
2022-11-21 03:37:27,797 - PoinTr - INFO - args.distributed : False
2022-11-21 03:37:27,798 - PoinTr - INFO - config.optimizer = edict()
2022-11-21 03:37:27,800 - PoinTr - INFO - config.optimizer.type : AdamW
2022-11-21 03:37:27,801 - PoinTr - INFO - config.optimizer.kwargs = edict()
2022-11-21 03:37:27,802 - PoinTr - INFO - config.optimizer.kwargs.lr : 0.0001
2022-11-21 03:37:27,803 - PoinTr - INFO - config.optimizer.kwargs.weight_decay : 0.0005
2022-11-21 03:37:27,804 - PoinTr - INFO - config.scheduler = edict()
2022-11-21 03:37:27,805 - PoinTr - INFO - config.scheduler.type : LambdaLR
2022-11-21 03:37:27,806 - PoinTr - INFO - config.scheduler.kwargs = edict()
2022-11-21 03:37:27,810 - PoinTr - INFO - config.scheduler.kwargs.decay_step : 21
2022-11-21 03:37:27,812 - PoinTr - INFO - config.scheduler.kwargs.lr_decay : 0.9
2022-11-21 03:37:27,814 - PoinTr - INFO - config.scheduler.kwargs.lowest_decay : 0.02
2022-11-21 03:37:27,817 - PoinTr - INFO - config.bnmscheduler = edict()
2022-11-21 03:37:27,818 - PoinTr - INFO - config.bnmscheduler.type : Lambda
2022-11-21 03:37:27,819 - PoinTr - INFO - config.bnmscheduler.kwargs = edict()
2022-11-21 03:37:27,819 - PoinTr - INFO - config.bnmscheduler.kwargs.decay_step : 21
2022-11-21 03:37:27,820 - PoinTr - INFO - config.bnmscheduler.kwargs.bn_decay : 0.5
2022-11-21 03:37:27,820 - PoinTr - INFO - config.bnmscheduler.kwargs.bn_momentum : 0.9
2022-11-21 03:37:27,821 - PoinTr - INFO - config.bnmscheduler.kwargs.lowest_decay : 0.01
2022-11-21 03:37:27,821 - PoinTr - INFO - config.dataset = edict()
2022-11-21 03:37:27,822 - PoinTr - INFO - config.dataset.train = edict()
2022-11-21 03:37:27,822 - PoinTr - INFO - config.dataset.train.base = edict()
2022-11-21 03:37:27,822 - PoinTr - INFO - config.dataset.train.base.NAME : PCN
2022-11-21 03:37:27,823 - PoinTr - INFO - config.dataset.train.base.CATEGORY_FILE_PATH : data/PCN/PCN.json
2022-11-21 03:37:27,823 - PoinTr - INFO - config.dataset.train.base.N_POINTS : 16384
2022-11-21 03:37:27,824 - PoinTr - INFO - config.dataset.train.base.N_RENDERINGS : 8
2022-11-21 03:37:27,824 - PoinTr - INFO - config.dataset.train.base.PARTIAL_POINTS_PATH : data/PCN/%s/partial/%s/%s/%02d.pcd
2022-11-21 03:37:27,825 - PoinTr - INFO - config.dataset.train.base.COMPLETE_POINTS_PATH : data/PCN/%s/complete/%s/%s.pcd
2022-11-21 03:37:27,827 - PoinTr - INFO - config.dataset.train.base.CARS : True
2022-11-21 03:37:27,827 - PoinTr - INFO - config.dataset.train.others = edict()
2022-11-21 03:37:27,828 - PoinTr - INFO - config.dataset.train.others.subset : train
2022-11-21 03:37:27,829 - PoinTr - INFO - config.dataset.train.others.bs : 64
2022-11-21 03:37:27,830 - PoinTr - INFO - config.dataset.val = edict()
2022-11-21 03:37:27,831 - PoinTr - INFO - config.dataset.val.base = edict()
2022-11-21 03:37:27,832 - PoinTr - INFO - config.dataset.val.base.NAME : PCN
2022-11-21 03:37:27,833 - PoinTr - INFO - config.dataset.val.base.CATEGORY_FILE_PATH : data/PCN/PCN.json
2022-11-21 03:37:27,834 - PoinTr - INFO - config.dataset.val.base.N_POINTS : 16384
2022-11-21 03:37:27,836 - PoinTr - INFO - config.dataset.val.base.N_RENDERINGS : 8
2022-11-21 03:37:27,836 - PoinTr - INFO - config.dataset.val.base.PARTIAL_POINTS_PATH : data/PCN/%s/partial/%s/%s/%02d.pcd
2022-11-21 03:37:27,837 - PoinTr - INFO - config.dataset.val.base.COMPLETE_POINTS_PATH : data/PCN/%s/complete/%s/%s.pcd
2022-11-21 03:37:27,838 - PoinTr - INFO - config.dataset.val.base.CARS : True
2022-11-21 03:37:27,839 - PoinTr - INFO - config.dataset.val.others = edict()
2022-11-21 03:37:27,839 - PoinTr - INFO - config.dataset.val.others.subset : test
2022-11-21 03:37:27,840 - PoinTr - INFO - config.dataset.test = edict()
2022-11-21 03:37:27,841 - PoinTr - INFO - config.dataset.test.base = edict()
2022-11-21 03:37:27,842 - PoinTr - INFO - config.dataset.test.base.NAME : KITTI
2022-11-21 03:37:27,842 - PoinTr - INFO - config.dataset.test.base.CATEGORY_FILE_PATH : data/KITTI/KITTI.json
2022-11-21 03:37:27,844 - PoinTr - INFO - config.dataset.test.base.N_POINTS : 16384
2022-11-21 03:37:27,845 - PoinTr - INFO - config.dataset.test.base.CLOUD_PATH : data/KITTI/cars/%s.pcd
2022-11-21 03:37:27,848 - PoinTr - INFO - config.dataset.test.base.BBOX_PATH : data/KITTI/bboxes/%s.txt
2022-11-21 03:37:27,854 - PoinTr - INFO - config.dataset.test.others = edict()
2022-11-21 03:37:27,855 - PoinTr - INFO - config.dataset.test.others.subset : test
2022-11-21 03:37:27,858 - PoinTr - INFO - config.model = edict()
2022-11-21 03:37:27,863 - PoinTr - INFO - config.model.NAME : PoinTr
2022-11-21 03:37:27,865 - PoinTr - INFO - config.model.num_pred : 14336
2022-11-21 03:37:27,866 - PoinTr - INFO - config.model.num_query : 224
2022-11-21 03:37:27,867 - PoinTr - INFO - config.model.knn_layer : 1
2022-11-21 03:37:27,868 - PoinTr - INFO - config.model.trans_dim : 384
2022-11-21 03:37:27,869 - PoinTr - INFO - config.total_bs : 64
2022-11-21 03:37:27,870 - PoinTr - INFO - config.step_per_update : 1
2022-11-21 03:37:27,870 - PoinTr - INFO - config.max_epoch : 600
2022-11-21 03:37:27,871 - PoinTr - INFO - config.consider_metric : CDL1
2022-11-21 03:37:27,872 - PoinTr - INFO - Distributed training: False
2022-11-21 03:37:27,872 - PoinTr - INFO - Set random seed to 0, deterministic: False
2022-11-21 03:37:27,958 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=02958343, Name=car]
2022-11-21 03:37:28,078 - PCNDATASET - INFO - Complete collecting files of the dataset. Total files: 5677
2022-11-21 03:37:28,176 - PCNDATASET - INFO - Collecting files of Taxonomy [ID=02958343, Name=car]
2022-11-21 03:37:28,177 - PCNDATASET - INFO - Complete collecting files of the dataset. Total files: 150
2022-11-21 03:37:28,178 - MODEL - INFO - Transformer with knn_layer 1
2022-11-21 03:38:30,817 - PoinTr - INFO - Using Data parallel ...
Format = auto
Extension = pcd
Format = auto
Extension = pcd
Format = auto
*****a lot of repeat ******
Format = auto
Extension = pcd
Format = auto
Extension = pcd
Format = auto
Traceback (most recent call last):
File "main.py", line 68, in
main()
File "main.py", line 64, in main
run_net(args, config, train_writer, val_writer)
File "/mnt/f/PoinTr/tools/runner.py", line 98, in run_net
ret = base_model(partial)
File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 166, in forward
return self.module(*inputs[0], **kwargs[0])
File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/f/PoinTr/models/PoinTr.py", line 92, in forward
q, coarse_point_cloud = self.base_model(xyz) # B M C and B M 3
File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/f/PoinTr/models/Transformer.py", line 354, in forward
knn_index = get_knn_index(coor)
File "/mnt/f/PoinTr/models/Transformer.py", line 19, in get_knn_index
_, idx = knn(coor_k, coor_q) # bs k np
File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/knn_cuda/init.py", line 61, in forward
d, i = knn(ref.float(), query.float(), self.k)
File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/knn_cuda/init.py", line 39, in knn
d, i = _knn.knn(ref, query, k)
RuntimeError: ref.is_contiguous()INTERNAL ASSERT FAILED at "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/knn_cuda/csrc/cuda/knn.cpp":29, please report a bug to PyTorch. ref must be contiguous

yuxumin · 2022-11-21T10:50:38Z

File "/mnt/f/PoinTr/models/Transformer.py", line 19, in get_knn_index
_, idx = knn(coor_k, coor_q) # bs k np

_, idx = knn(coor_k.contiguous(), coor_q.contiguous()) # bs k np

jackie174 · 2022-11-21T11:02:06Z

SORRY to bother you again!
Can I know what kind of environment you use? For Cuda, tensor, TensorFlow, GCC, Python...
I even think mainly problem is made by the environment.
After I modified it, I got below:

  main()
  File "main.py", line 64, in main
    run_net(args, config, train_writer, val_writer)
  File "/mnt/f/PoinTr/tools/runner.py", line 98, in run_net
    ret = base_model(partial)
  File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 166, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/f/PoinTr/models/PoinTr.py", line 92, in forward
    q, coarse_point_cloud = self.base_model(xyz) # B M C and B M 3
  File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/f/PoinTr/models/Transformer.py", line 366, in forward
    x = blk(x + pos, knn_index)   # B N C
  File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/f/PoinTr/models/Transformer.py", line 217, in forward
    knn_f = get_graph_feature(norm_x, knn_index)
  File "/mnt/f/PoinTr/models/Transformer.py", line 33, in get_graph_feature
    feature = feature.view(batch_size, k, num_query, num_dims)
RuntimeError: shape '[48, 8, 128, 384]' is invalid for input of size 442368

yuxumin · 2022-11-21T11:08:23Z

so, what's the shape of 'knn_index' in 'https://github.com/yuxumin/PoinTr/blob/master/models/Transformer.py#L32' in your code. Can you make sure you are in the right way to inference the code? (right model on the corresponding dataset)

jackie174 · 2022-11-21T11:45:06Z

This is what I use in the code: https://github.com/Cmput-414/PoinTr/tree/change
My environment:
Cuda 10.1,
Torch 1.9.0+cu102,
torchaudio 0.9.0,
torchvision 0.10.0+cu102,
GCC 9.4
python 3.8.10

bash ./scripts/train.sh 0 --config ./cfgs/PCN_models/PoinTr.yaml --exp_name example

knn_index_shape: torch.Size([1152]) knn_index: tensor([ 0, 1, 2, ..., 6018, 6018, 6017], device='cuda:0')

RuntimeError: shape '[48, 8, 128, 384]' is invalid for input of size 442368

bash ./scripts/train.sh 0 --config ./cfgs/PCN_models/GRNet.yaml --exp_name example

RuntimeError: CUDA out of memory. Tried to allocate 128.00 MiB (GPU 0; 6.00 GiB total capacity; 3.47 GiB already allocated; 1020.84 MiB free; 3.48 GiB reserved in total by PyTorch)

bash ./scripts/train.sh 0 --config ./cfgs/PCN_models/PCN.yaml --exp_name example

RuntimeError: CUDA out of memory. Tried to allocate 1024.00 MiB (GPU 0; 6.00 GiB total capacity; 2.74 GiB already allocated; 1.55 GiB free; 2.93 GiB reserved in total by PyTorch)

bash ./scripts/train.sh 0 --config ./cfgs/KITTI_models/PoinTr.yaml --exp_name example

RuntimeError: CUDA out of memory. Tried to allocate 1024.00 MiB (GPU 0; 6.00 GiB total capacity; 1.18 GiB already allocated; 3.28 GiB free; 1.20 GiB reserved in total by PyTorch)

bash ./scripts/train.sh 0 --config ./cfgs/KITTI_models/PCN.yaml --exp_name example

RuntimeError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 6.00 GiB total capacity; 481.80 MiB already allocated; 3.82 GiB free; 682.00 MiB reserved in total by PyTorch)

bash ./scripts/train.sh 0 --config ./cfgs/ShapeNet55_models/PCN.yaml --exp_name example

RuntimeError: CUDA out of memory. Tried to allocate 1.01 GiB (GPU 0; 6.00 GiB total capacity; 2.22 GiB already allocated; 1.95 GiB free; 2.53 GiB reserved in total by PyTorch)

bash ./scripts/train.sh 0 --config ./cfgs/ShapeNet55_models/PoinTr.yaml --exp_name example

RuntimeError: CUDA out of memory. Tried to allocate 1.20 GiB (GPU 0; 6.00 GiB total capacity; 2.38 GiB already allocated; 1.29 GiB free; 3.19 GiB reserved in total by PyTorch)

1.     main()
  File "main.py", line 64, in main
    run_net(args, config, train_writer, val_writer)
  File "/mnt/f/PoinTr/tools/runner.py", line 98, in run_net
    ret = base_model(partial)
  File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 166, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/f/PoinTr/models/PoinTr.py", line 92, in forward
    q, coarse_point_cloud = self.base_model(xyz) # B M C and B M 3
  File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/f/PoinTr/models/Transformer.py", line 367, in forward
    x = blk(x + pos, knn_index)   # B N C
  File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/f/PoinTr/models/Transformer.py", line 218, in forward
    knn_f = get_graph_feature(norm_x, knn_index)
  File "/mnt/f/PoinTr/models/Transformer.py", line 34, in get_graph_feature
    feature = feature.view(batch_size, k, num_query, num_dims)
RuntimeError: shape '[48, 8, 128, 384]' is invalid for input of size 442368

2. Traceback (most recent call last):
  File "main.py", line 68, in <module>
    main()
  File "main.py", line 64, in main
    run_net(args, config, train_writer, val_writer)
  File "/mnt/f/PoinTr/tools/runner.py", line 98, in run_net
    ret = base_model(partial)
  File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 166, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/f/PoinTr/models/GRNet.py", line 141, in forward
    pt_features_32_l = self.conv1(pt_features_64_l)
  File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/container.py", line 139, in forward
    input = module(input)
  File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/pooling.py", line 240, in forward
    return F.max_pool3d(input, self.kernel_size, self.stride,
  File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/_jit_internal.py", line 405, in fn
    return if_false(*args, **kwargs)
  File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/functional.py", line 784, in _max_pool3d
    return torch.max_pool3d(input, kernel_size, stride, padding, dilation, ceil_mode)
RuntimeError: CUDA out of memory. Tried to allocate 128.00 MiB (GPU 0; 6.00 GiB total capacity; 3.47 GiB already allocated; 1020.84 MiB free; 3.48 GiB reserved in total by PyTorch)

3.   File "main.py", line 64, in main
    run_net(args, config, train_writer, val_writer)
  File "/mnt/f/PoinTr/tools/runner.py", line 98, in run_net
    ret = base_model(partial)
  File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 166, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/f/PoinTr/models/PCN.py", line 76, in forward
    fine = self.final_conv(feat) + point_feat   # B 3 N
  File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/container.py", line 139, in forward
    input = module(input)
  File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 298, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/mnt/f/PoinTr/venv/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 294, in _conv_forward
    return F.conv1d(input, weight, bias, self.stride,
RuntimeError: CUDA out of memory. Tried to allocate 1024.00 MiB (GPU 0; 6.00 GiB total capacity; 2.74 GiB already allocated; 1.55 GiB free; 2.93 GiB reserved in total by PyTorch)

yuxumin · 2022-11-22T08:25:59Z

Sorry that i am not familiar with Google Colab, and can not run the code in your colab.

knn_index_shape: torch.Size([1152]) knn_index: tensor([ 0, 1, 2, ..., 6018, 6018, 6017], device='cuda:0')
RuntimeError: shape '[48, 8, 128, 384]' is invalid for input of size 442368

knn_index should be (bs * k * np), in the origin setting for PCN dataset, k = 8 and np = 224.
I think the error may due to the knn_cuda in your env.

I update a pytorch-based knn algorithm, could you can try the new code?

RuntimeError: CUDA out of memory.

For OOM problem, i think you can reduce the batchsize (just modify the yaml file)

jackie174 · 2022-11-22T11:08:34Z

THANK U SO MUCH!:
When I change the batch size to 2, it is running!!!
For now:
I followed by this
Then I also change the code that u just modified.
Yeah, the main thing is SET ENVIRONMENT.
This confused me a lot. But, it is solved, and I can start to learn the code.
Thanks again!!!!
You are really nice!!!

yuxumin · 2022-11-22T11:14:26Z

@jackie174, Congrats!

yuxumin closed this as completed Nov 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AssertionError when idx.shape[1] == k #79

AssertionError when idx.shape[1] == k #79

jackie174 commented Nov 4, 2022

yuxumin commented Nov 4, 2022

jackie174 commented Nov 4, 2022 •

edited

Loading

yuxumin commented Nov 4, 2022

jackie174 commented Nov 13, 2022

yuxumin commented Nov 13, 2022 •

edited

Loading

jackie174 commented Nov 14, 2022 •

edited

Loading

jackie174 commented Nov 19, 2022 •

edited

Loading

yuxumin commented Nov 19, 2022

jackie174 commented Nov 20, 2022 •

edited

Loading

yuxumin commented Nov 21, 2022

jackie174 commented Nov 21, 2022 •

edited

Loading

yuxumin commented Nov 21, 2022

jackie174 commented Nov 21, 2022

yuxumin commented Nov 21, 2022

jackie174 commented Nov 21, 2022 •

edited

Loading

jackie174 commented Nov 21, 2022

yuxumin commented Nov 21, 2022

jackie174 commented Nov 21, 2022 •

edited

Loading

yuxumin commented Nov 21, 2022 •

edited

Loading

jackie174 commented Nov 21, 2022 •

edited

Loading

yuxumin commented Nov 22, 2022 •

edited

Loading

jackie174 commented Nov 22, 2022

yuxumin commented Nov 22, 2022

AssertionError when idx.shape[1] == k #79

AssertionError when idx.shape[1] == k #79

Comments

jackie174 commented Nov 4, 2022

yuxumin commented Nov 4, 2022

jackie174 commented Nov 4, 2022 • edited Loading

yuxumin commented Nov 4, 2022

jackie174 commented Nov 13, 2022

yuxumin commented Nov 13, 2022 • edited Loading

jackie174 commented Nov 14, 2022 • edited Loading

jackie174 commented Nov 19, 2022 • edited Loading

yuxumin commented Nov 19, 2022

jackie174 commented Nov 20, 2022 • edited Loading

yuxumin commented Nov 21, 2022

jackie174 commented Nov 21, 2022 • edited Loading

yuxumin commented Nov 21, 2022

jackie174 commented Nov 21, 2022

yuxumin commented Nov 21, 2022

jackie174 commented Nov 21, 2022 • edited Loading

jackie174 commented Nov 21, 2022

yuxumin commented Nov 21, 2022

jackie174 commented Nov 21, 2022 • edited Loading

yuxumin commented Nov 21, 2022 • edited Loading

jackie174 commented Nov 21, 2022 • edited Loading

yuxumin commented Nov 22, 2022 • edited Loading

jackie174 commented Nov 22, 2022

yuxumin commented Nov 22, 2022

jackie174 commented Nov 4, 2022 •

edited

Loading

yuxumin commented Nov 13, 2022 •

edited

Loading

jackie174 commented Nov 14, 2022 •

edited

Loading

jackie174 commented Nov 19, 2022 •

edited

Loading

jackie174 commented Nov 20, 2022 •

edited

Loading

jackie174 commented Nov 21, 2022 •

edited

Loading

jackie174 commented Nov 21, 2022 •

edited

Loading

jackie174 commented Nov 21, 2022 •

edited

Loading

yuxumin commented Nov 21, 2022 •

edited

Loading

jackie174 commented Nov 21, 2022 •

edited

Loading

yuxumin commented Nov 22, 2022 •

edited

Loading