-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
save and load from disque sample in ImageDataset #108
Comments
Hi @romainVala, Thanks for reporting this. Isn't this just offline data augmentation? Can't you use |
It is not exactly the same what you propose is to save the transform as nifti file, I save the sample dictionary structure, (since I need, some special keys written by the transform during training) |
By "special keys" do you mean the random parameters? |
yes |
I don't love the idea of pickling dictionaries. What about saving those parameters in a text file and creating a dataset that inherits from |
note sure I understand what pickling is ... , I just do a torch.save ... ? (so simple ) for the example you link, i do not see how it solve the problem, which is (if I follow correctly): adding extra informations to the images dictionary (from a csv file for instance ?) |
Sorry, I meant the example in the commit message: class MyDataset(torchio.ImagesDataset):
def get_image_dict_from_image(self, image):
image_dict = super().get_image_dict_from_image(image)
subject_id = image.path.name.split('_')[0]
image_dict['subject_id'] = subject_id
return image_dict You could, for example, do: class RandomMotionDataset(torchio.ImagesDataset):
def get_image_dict_from_image(self, image): # overrides ImagesDataset.get_image_dict_from_image
image_dict = super().get_image_dict_from_image(image) # standard image_dict
motion_parameters = get_motion_parameters(image.path)
image_dict['random_motion'] = motion_parameters
return image_dict
def get_motion_parameters(self, path):
parameters_path = path.parent / path.name.replace('.nii.gz', '.json')
parameters = read_json(parameters_path) # defined somewhere else
return parameters |
Closing as now there is a |
Hi there,
Is your feature request related to a problem? Please describe.
training with random motion is taking to much time, even with 20 numworker (and 180 G of ram) I am not quick enough to occupy the gpu during the training. and when testing different model parameter, it is just insane to require so much computation ...
any way it is just to slow
Describe the solution you'd like
The solution is to compute the transform samples first then save them to disk, and then allow a method in ImageDataset to load the sample from disk
You may think you will not gain time : because you need to first save the sample to disk
but you gain if you want to test different model, and if you have access to a cluster. (that allow a very fast sample generation)
The solution
it is indeed very simple (and efficient ) to implement. I could try a PR if you are interested, but since it is a small change I just copy to modified code from ImageDataset
Note the good part : you can still apply some transform even after loading from the disk (very convenient for quick transform)
Personally I did not use the save_to_dir argument, because I implement it outside, to properly handle the index, in a cluster case (ie multiple intense running with different dataset subpar, but then the same index ...)
but if you do it locally it is working fine
I hope it helps
The text was updated successfully, but these errors were encountered: