Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

Enable tiling non-PANDA WSI datasets #621

Merged
merged 16 commits into from
Dec 16, 2021
Merged

Enable tiling non-PANDA WSI datasets #621

merged 16 commits into from
Dec 16, 2021

Conversation

dccastro
Copy link
Member

@dccastro dccastro commented Dec 14, 2021

This PR implements the following major changes in the tiling/preprocessing pipeline:

  • Create mask-free LoadROId transform using foreground auto-segmentation using Otsu threshold by default if threshold is unspecified.
  • Create more generic tiling scripts (create_tiles_dataset.py and azure_tiles_creation.py).
  • Update and back-up working PANDA tiling scripts as create_panda_tiles_dataset.py and azure_panda_tiles_creation.py for backward-compatibility.
  • Replace OpenSlide backend with cuCIM for loading WSI files. cuCIM only works on Linux.

Additionally, I've refactored our dataset classes:

  • Create SlideKey and TileKey schemas for indexing the respective batch dictionaries instead of hardcoded strings. Note that TileKey is not yet used in TilesDataset and DeepMIL; this will be addressed in a separate follow-up PR.
  • Create base SlidesDataset, now inherited by the simplified PandaDataset and TcgaPradDataset.

Other:

  • Add tests for slide loading, luminance, foreground seg., bounding box. Most of these run with a real .tiff file from the PANDA dataset, added via git-lfs.

@dccastro dccastro changed the title Enable tiling non-PANDA WSI datasets [WIP] Enable tiling non-PANDA WSI datasets Dec 14, 2021
@dccastro dccastro changed the title [WIP] Enable tiling non-PANDA WSI datasets Enable tiling non-PANDA WSI datasets Dec 14, 2021
@dccastro dccastro marked this pull request as ready for review December 14, 2021 19:11
@@ -20,6 +20,7 @@ dependencies:
- azureml-tensorboard==1.36.0
- conda-merge==0.1.5
- cryptography==3.3.2
- cucim==21.10.1; platform_system=="Linux"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This spec prevents the Windows builds from failing, as cuCIM is incompatible.

.gitattributes Show resolved Hide resolved
main(panda_dir="/tmp/datasets/PANDA",
root_output_dir="/datadrive",
level=1,
from InnerEye.ML.Histopathology.datasets.tcga_prad_dataset import TcgaPradDataset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If TcgaPrad is removed also this block should be removed - is it a problem we don't actually have a single dataset implementation that is compatible with this script

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following your separate suggestion, I've decided to keep TCGA-PRAD as an example, and added a clarifying comment here.


image_path = sample[dataset.IMAGE_COLUMN]
assert isinstance(image_path, str)
assert os.path.isfile(image_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To not leave things completely untested, do you think we could have a SlideDataset test? obviously we can't test the length or number of positives ... but we can test the dataset contains the expected keys and and that the content of the dict has the expected type. Looking at the dataset definition, if path is an existing path and we pass a dataset.csv, we can run these tests without need for mounting any real data. What you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now added a test_slides_dataset.csv and some basic tests in test_slides_dataset.py.

maxilse
maxilse previously approved these changes Dec 16, 2021
@dccastro dccastro merged commit 6a4d334 into main Dec 16, 2021
@dccastro dccastro deleted the dacoelh/tiling branch December 16, 2021 16:11
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants