Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom dataset, data loader #86

Open
MancaZerovnikMekuc opened this issue Nov 22, 2019 · 6 comments
Open

Custom dataset, data loader #86

MancaZerovnikMekuc opened this issue Nov 22, 2019 · 6 comments

Comments

@MancaZerovnikMekuc
Copy link

Hi,

I have a custom 3D dataset. I have spent a lot of time trying to run preprocessing script on LIDC data but I still have some issues running preprocessing script - looks like my characteristic.csv file is not what it should be. Can somebody provide a description how the output of the script is formatted?

I want to run the maskrcnn on my custom 3D dataset which consists of volumes with voxelwise annotations.

Can somebody describe how the data should be formatted for the example data_loader?
"Example Data Loader for the LIDC data set. This dataloader expects preprocessed data in .npy or .npz files per patient and a pandas dataframe in the same directory containing the meta-info e.g. file paths, labels, foregound slice-ids."

From this I do not know how to format the data. Has anybody successfully ran the model on its own volumetric data with voxelwise annotations and can share the dataloader or some specificaton of the data formatting for the existing dataloader?

@pfjaeger
Copy link
Member

you could generate the toy data and run some trainings on it. It will show you how the data is structured and read by the data loader.

@MancaZerovnikMekuc
Copy link
Author

MancaZerovnikMekuc commented Nov 25, 2019

Thank you. I have done that. But I have multiple instances in one image. How to structure that kind of data? Also I would like to include patching. What and what form of data should be in variable "class_target" in meta_info_dict field created by preprocessing.py for your data_loader.py ?
Also, I have 3D data, not 2D as in toy example.

@lspinheiro
Copy link

The toy data seems to only handle a segmentation example. Is there any documentation about how to generate bounding box labels?

@thanhpt55
Copy link

@MancaZerovnikMekuc could you show me how to read image, label when begin train. As possible as you can give me the example data structure?
Thank you

@Gregor1337
Copy link
Collaborator

@MancaZerovnikMekuc

Thank you. I have done that. But I have multiple instances in one image. How to structure that kind of data? Also I would like to include patching. What and what form of data should be in variable "class_target" in meta_info_dict field created by preprocessing.py for your data_loader.py ?
Also, I have 3D data, not 2D as in toy example.

  • During batch generation (in the dataloader scripts) "class_target" information holds the RoI-wise class labels, i.e., one class label per RoI. It is structured as a list of lists of numbers per batch.
    I.e., in your generate_train_batch function in your BatchGenerator, your final batch dictionary should hold an entry "class_target", which looks, e.g., like this: [ [0,1], [2,0], [1] ]. In that example you have 3 classes. Batch element one has two RoIs, the first is of class 0, the second of class 1. Batch element two also has two RoIs, first of class 2, second of class 0. Third batch element has only one RoI of class 1.
    The id-number of a RoI equals its position within those batch elements lists. The id-number needs to correspond to the pixel-wise localization in the segmentation ground truth shifted by 1 since 0 is for background (all pixels that belong to RoI with id 0 need to be marked value 1 in the segmentation).

  • To include patching, I'd encourage you to follow the example in lidc_exp->dataloader.py PatientIterator. We only offer inclusive patching for the PatientIterator, i.e., during validation and testing, but not training. During training we sample patches instead of including the whole patched image. Apart from data loading you do not need to concern yourself with patching, it is all already implemented in the framework (in predictor.py).

  • The differences between 2D and 3D are marginal, you may look into lidc_exp for a guideline.

@themantalope
Copy link

themantalope commented May 31, 2021

@Gregor1337 - very helpful. would be nice to have a documentation file with this information somewhere in the repo. It might also be useful to see how to structure this from a toy experiment that generates multiple ROIs for a single training example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants