Caching of volumes to GPU during training of deepedit (radiology app) #874

nvahmadi · 2022-07-14T15:41:09Z

Currently, it is not possible to cache data to GPU during training of deepedit, to accelerate training as described in the Fast Training Tutorial from MONAI-Core.

This idea was already mentioned in PR #485, which already implemented other acceleration techniques (e.g. DiceCE Loss, Novograd optimizer, ThreadDataLoader).

I tried putting the two transforms ToTensord() and ToDeviced() before the first Randomized transform, but that throws an error that a torch tensor cannot be cast to a numpy tensor (the error is thrown in the transform AddInitialSeedPointMissingLabelsd()).

Looking into the deepedit training transforms, the reason for the above error is probably the computation of a chamfer distance function using scipy's distance_transform_cdt. I saw in the MONAI-Core discussion #1332 that @tvercaut notified us about their recent work FastGeodis, which allows for fast computation of Euclidean/Geodesic distance functions based on cuda (torch-compatible!).

It would be great to revisit the idea of PR #485 and offer caching of images to GPU during training. My simple attempt above is not sufficient: apart from having to make AddInitialSeedPointMissingLabelsd() torch-based, the Multi-GPU scenario requires distributed caching across GPUs - I am not sure where in the MONAI-Label code this would go.

The text was updated successfully, but these errors were encountered:

SachidanandAlle · 2022-07-28T04:09:00Z

You already have the option to cache the dataset.. And by default it is CachedDataset.. and it uses ThreadDataLoader
https://github.com/Project-MONAI/MONAILabel/blob/main/monailabel/tasks/train/basic_train.py#L149-L150

Optimizers, Transforms etc.. can be defined in your train task definition..
https://github.com/Project-MONAI/MONAILabel/blob/main/sample-apps/radiology/lib/trainers/deepedit.py#L76-L80

Please connect with @diazandr3s (Andres) our core developer for DeepEdit to further optimize the performance of corresponding transforms.. I think with MetaTensor support things can be done more clean and if possible to keep all the image data on GPUs while running many of pre-transforms.. that should give a boost wrt latency..

and yes.. if we can have run distance_transform_cdt at faster speed, that shall help.. specially to run the simulation clicks while training.. and currently that's the main time-consuming operation (run N times for every batch while training)

diazandr3s · 2022-12-19T00:30:13Z

This will be solved as part of the interactivity restructuring: #1173

SachidanandAlle added the backlog Items to be decided in the future when/if to implement label Oct 5, 2022

SachidanandAlle added this to Needs triage in Backlog via automation Oct 5, 2022

SachidanandAlle assigned diazandr3s and tangy5 Dec 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Caching of volumes to GPU during training of deepedit (radiology app) #874

Caching of volumes to GPU during training of deepedit (radiology app) #874

nvahmadi commented Jul 14, 2022

SachidanandAlle commented Jul 28, 2022 •

edited

Loading

diazandr3s commented Dec 19, 2022

Caching of volumes to GPU during training of deepedit (radiology app) #874

Caching of volumes to GPU during training of deepedit (radiology app) #874

Comments

nvahmadi commented Jul 14, 2022

SachidanandAlle commented Jul 28, 2022 • edited Loading

diazandr3s commented Dec 19, 2022

SachidanandAlle commented Jul 28, 2022 •

edited

Loading