Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train on custom data #31

Open
aravind3134 opened this issue Nov 21, 2019 · 6 comments
Open

Train on custom data #31

aravind3134 opened this issue Nov 21, 2019 · 6 comments

Comments

@aravind3134
Copy link

Hey,

I am trying to train custom data using GCNet. I have the data in COCO data format. I want to know the exact procedure to train it. Because, just running the train.sh script, gives me Index error.

I am changing the config file to make it work, but didn't find any luck with that. Please let me know the fields that should be changed to make it work.

Thanks.

@xvjiarui
Copy link
Owner

Sorry for the late reply.
Could you please provide the error message?
The training procedure should be the same as mmdetection.

@aravind3134
Copy link
Author

I tried to run a config file changing the data location.

In my case, the number of classes are only 2. I also have to change the name of the classes. I think I am getting error only because of it.

Please let me know how to do it. What should be changed?

As of now, I get the following index error:

Traceback (most recent call last):
Traceback (most recent call last):
File "./tools/train.py", line 103, in
File "./tools/train.py", line 103, in
main()
main()
File "./tools/train.py", line 99, in main
File "./tools/train.py", line 99, in main
logger=logger)
File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/mmdet-0.6.0+a9fcc88-py3.6.egg/mmdet/apis/train.py", line 60, in train_detector
logger=logger)
File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/mmdet-0.6.0+a9fcc88-py3.6.egg/mmdet/apis/train.py", line 60, in train_detector
_dist_train(model, dataset, cfg, validate=validate)
File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/mmdet-0.6.0+a9fcc88-py3.6.egg/mmdet/apis/train.py", line 189, in _dist_train
_dist_train(model, dataset, cfg, validate=validate)
File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/mmdet-0.6.0+a9fcc88-py3.6.egg/mmdet/apis/train.py", line 189, in _dist_train
runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/mmcv-0.2.14-py3.6-linux-x86_64.egg/mmcv/runner/runner.py", line 358, in run
runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/mmcv-0.2.14-py3.6-linux-x86_64.egg/mmcv/runner/runner.py", line 358, in run
epoch_runner(data_loaders[i], **kwargs)
File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/mmcv-0.2.14-py3.6-linux-x86_64.egg/mmcv/runner/runner.py", line 260, in train
epoch_runner(data_loaders[i], **kwargs)
File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/mmcv-0.2.14-py3.6-linux-x86_64.egg/mmcv/runner/runner.py", line 260, in train
for i, data_batch in enumerate(data_loader):
File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 582, in next
for i, data_batch in enumerate(data_loader):
File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 582, in next
return self._process_next_batch(batch)
return self._process_next_batch(batch)
File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 608, in _process_next_batch
File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 608, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
IndexError: Traceback (most recent call last):

Thanks

@xvjiarui
Copy link
Owner

It seems that there is some problem with your data loader.
I suggest you use single process to debug your code, e.g. 1 gpu only, so you could add breakpoint inside your code.

@aravind3134
Copy link
Author

Hey, Can you please tell me the changes required to successfully train a custom data set created in COCO data set format with GCNet?

@xvjiarui
Copy link
Owner

xvjiarui commented Dec 3, 2019

I think there are two workarounds. Either of them should be fine.

  1. convert your data into exactly the same format as COCO annotation.
  2. follow this to create your own dataset

@aravind3134
Copy link
Author

Hey, I am trying to run my own data in same format as COCO dataset and use one of the configuration files to run training. As my data doesn't have segmantation attribute, I tried to run the my dataset and coco dataset with the setting 'with_mask' as 'False' in the config file. Do I need to change something else in the configuration file to make it work?

I am using the config file in this location: configs/gcnet/r50/mask_rcnn_r50_fpn_2x.py

Error:
Traceback (most recent call last): File "./tools/train.py", line 106, in <module> main() File "./tools/train.py", line 101, in main logger=logger) File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/mmdet-0.6.0+a9fcc88-py3.6.egg/mmdet/apis/train.py", line 65, in train_detector _dist_train(model, dataset, cfg, validate=validate) File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/mmdet-0.6.0+a9fcc88-py3.6.egg/mmdet/apis/train.py", line 201, in _dist_train runner.run(data_loaders, cfg.workflow, cfg.total_epochs) File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/mmcv-0.2.14-py3.6-linux-x86_64.egg/mmcv/runner/runner.py", line 361, in run epoch_runner(data_loaders[i], **kwargs) File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/mmcv-0.2.14-py3.6-linux-x86_64.egg/mmcv/runner/runner.py", line 264, in train self.model, data_batch, train_mode=True, **kwargs) File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/mmdet-0.6.0+a9fcc88-py3.6.egg/mmdet/apis/train.py", line 44, in batch_processor losses = model(**data) File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/mmcv-0.2.14-py3.6-linux-x86_64.egg/mmcv/parallel/distributed.py", line 50, in forward return self.module(*inputs[0], **kwargs[0]) File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/mmdet-0.6.0+a9fcc88-py3.6.egg/mmdet/core/fp16/decorators.py", line 49, in new_func return old_func(*args, **kwargs) File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/mmdet-0.6.0+a9fcc88-py3.6.egg/mmdet/models/detectors/base.py", line 86, in forward return self.forward_train(img, img_meta, **kwargs) File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/mmdet-0.6.0+a9fcc88-py3.6.egg/mmdet/models/detectors/two_stage.py", line 183, in forward_train sampling_results, gt_masks, self.train_cfg.rcnn) File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/mmdet-0.6.0+a9fcc88-py3.6.egg/mmdet/models/mask_heads/fcn_mask_head.py", line 112, in get_target gt_masks, rcnn_train_cfg) File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/mmdet-0.6.0+a9fcc88-py3.6.egg/mmdet/core/mask/mask_target.py", line 10, in mask_target pos_assigned_gt_inds_list, gt_masks_list, cfg_list) TypeError: 'NoneType' object is not iterable Traceback (most recent call last): File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/torch/distributed/launch.py", line 235, in <module> main() File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/torch/distributed/launch.py", line 231, in main cmd=process.args) subprocess.CalledProcessError: Command '['/home/ubuntu/anaconda3/envs/tensorflow_p36/bin/python', '-u', './tools/train.py', '--local_rank=0', 'configs/gcnet/r50/mask_rcnn_r50_fpn_2x.py', '--launcher', 'pytorch']' returned non-zero exit status 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants