Parameterize dtype for h5path with `SlideData` constructor #335

tddough98 · 2022-10-14T19:23:54Z

Currently, PathML stores all images with float16, forcing all image inputs to be upcast or downcast to this data type, which increases storage size or loses information. There already is a dtype parameter in the SlideData constructor, but it's only used to assist the BioFormatsBackend in loading images correctly. This repurposes that parameter to control what dtype h5py uses when writing image data.

I also changed masks to stored as ENUM and use the strongest compression setting as boolean masks are highly compressible and easily compressed. The compression made a huge difference in file size, and using (HDFView)[https://www.hdfgroup.org/downloads/hdfview/] showed a compression ratio of 100-200x for masks. The ENUM data type is stored as an 8-bit integer (https://docs.h5py.org/en/stable/special.html#enumerated-types) but at least this is less than using float16.

ryanccarelli · 2022-10-15T18:55:17Z

@jacob-rosenthal Do I remember you saying that certain transforms expect float16? If you remember which we should tag and document these

jacob-rosenthal · 2022-10-16T03:54:02Z

Thanks @tddough98 , this looks great! The reason we had to add the dtype parameter in the first place was as a workaround because some of our collaborators were running into problems with images where the dtype from the metadata didn't match the dtype of the image data itself. So the parameter was added so that people could read images in e.g. float16 even if the image metadata said int8. From what I can tell looking at the code, this PR will still keep that behavior?
(Side note, we should add an explicit test for this using one of the misi images to be sure that they are always supported, maybe @MohamedOmar2020 you can help provide a small misi image we can add to the test suite?)

And yes, @ryanccarelli I think we did run into problems with some of the image transforms being picky about dtypes, and found that float16 worked well usually. I think since float16 remains the default, this should still be fine, since things would only break if people manually change dtype themselves

So from what I can tell, this PR seems like strictly an improvement, keeping existing behavior but making it cleaner & more flexible

tddough98 · 2022-10-25T19:08:25Z

I reverted storing all masks as boolean. I was focused on masks made by TissueDetectionHE, which are always boolean, but now realize that instance segmentation masks from mesmer or hovernet need to use different values for each instance. Now, masks are saved with the same datatype as the numpy array for the mask.

jacob-rosenthal · 2022-10-25T19:14:10Z

Ah yes that is a good catch. I think that should have been caught by the tests? But looks like the tests haven't run for this PR yet, I guess because it is a PR that isn't merging into dev or master (the only branches that trigger tests).

tddough98 added 4 commits October 4, 2022 15:35

Store masks as bools

cab0409

Use dtype from SlideData constructor

6e6f802

Merge branch 'merge-master' into change-dtypes

6dbf069

Update h5path documentation

be5ed26

tddough98 requested a review from jacob-rosenthal October 14, 2022 19:23

tddough98 changed the title ~~Parameterize dtype for h5path with Slide constructor~~ Parameterize dtype for h5path with SlideData constructor Oct 14, 2022

tddough98 mentioned this pull request Oct 20, 2022

Allow SlideData to use existing h5path files #337

Open

Revert making all masks boolean

8c78593

tddough98 added 3 commits October 25, 2022 16:59

Merge branch 'merge-master' into change-dtypes

8dfafde

Merge branch 'merge-master' into change-dtypes

08d8327

Fix parentheses

5353ac2

Base automatically changed from merge-master to dev October 26, 2022 19:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parameterize dtype for h5path with `SlideData` constructor #335

Parameterize dtype for h5path with `SlideData` constructor #335

tddough98 commented Oct 14, 2022 •

edited

ryanccarelli commented Oct 15, 2022

jacob-rosenthal commented Oct 16, 2022

tddough98 commented Oct 25, 2022

jacob-rosenthal commented Oct 25, 2022

Parameterize dtype for h5path with SlideData constructor #335

Are you sure you want to change the base?

Parameterize dtype for h5path with SlideData constructor #335

Conversation

tddough98 commented Oct 14, 2022 • edited

ryanccarelli commented Oct 15, 2022

jacob-rosenthal commented Oct 16, 2022

tddough98 commented Oct 25, 2022

jacob-rosenthal commented Oct 25, 2022

Parameterize dtype for h5path with `SlideData` constructor #335

Parameterize dtype for h5path with `SlideData` constructor #335

tddough98 commented Oct 14, 2022 •

edited