Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caching of volumes to GPU during training of deepedit (radiology app) #874

Open
nvahmadi opened this issue Jul 14, 2022 · 2 comments
Open
Assignees
Labels
backlog Items to be decided in the future when/if to implement
Projects

Comments

@nvahmadi
Copy link

Currently, it is not possible to cache data to GPU during training of deepedit, to accelerate training as described in the Fast Training Tutorial from MONAI-Core.

This idea was already mentioned in PR #485, which already implemented other acceleration techniques (e.g. DiceCE Loss, Novograd optimizer, ThreadDataLoader).

I tried putting the two transforms ToTensord() and ToDeviced() before the first Randomized transform, but that throws an error that a torch tensor cannot be cast to a numpy tensor (the error is thrown in the transform AddInitialSeedPointMissingLabelsd()).

Looking into the deepedit training transforms, the reason for the above error is probably the computation of a chamfer distance function using scipy's distance_transform_cdt. I saw in the MONAI-Core discussion #1332 that @tvercaut notified us about their recent work FastGeodis, which allows for fast computation of Euclidean/Geodesic distance functions based on cuda (torch-compatible!).

It would be great to revisit the idea of PR #485 and offer caching of images to GPU during training. My simple attempt above is not sufficient: apart from having to make AddInitialSeedPointMissingLabelsd() torch-based, the Multi-GPU scenario requires distributed caching across GPUs - I am not sure where in the MONAI-Label code this would go.

@SachidanandAlle
Copy link
Collaborator

SachidanandAlle commented Jul 28, 2022

You already have the option to cache the dataset.. And by default it is CachedDataset.. and it uses ThreadDataLoader
https://github.com/Project-MONAI/MONAILabel/blob/main/monailabel/tasks/train/basic_train.py#L149-L150

Optimizers, Transforms etc.. can be defined in your train task definition..
https://github.com/Project-MONAI/MONAILabel/blob/main/sample-apps/radiology/lib/trainers/deepedit.py#L76-L80

Please connect with @diazandr3s (Andres) our core developer for DeepEdit to further optimize the performance of corresponding transforms.. I think with MetaTensor support things can be done more clean and if possible to keep all the image data on GPUs while running many of pre-transforms.. that should give a boost wrt latency..

and yes.. if we can have run distance_transform_cdt at faster speed, that shall help.. specially to run the simulation clicks while training.. and currently that's the main time-consuming operation (run N times for every batch while training)

@SachidanandAlle SachidanandAlle added the backlog Items to be decided in the future when/if to implement label Oct 5, 2022
@SachidanandAlle SachidanandAlle added this to Needs triage in Backlog via automation Oct 5, 2022
@diazandr3s
Copy link
Collaborator

This will be solved as part of the interactivity restructuring: #1173

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlog Items to be decided in the future when/if to implement
Projects
Backlog
Needs triage
Development

No branches or pull requests

4 participants