Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add postprocessing step and tests for PanNuke #49

Merged
merged 3 commits into from
Jan 4, 2021
Merged

Conversation

jacob-rosenthal
Copy link
Collaborator

When PanNuke dataset is downloaded, it comes in 3 folds and each fold is a single .npy file for the images and a single .npy file for the maks and a third .npy file for the tissue type labels. There are ~2500 images per fold, so these .npy files for the images and masks are pretty big.

The current implementation calls np.load() on the big .npy files, and then slices them to get the right images/masks. Loading these arrays can be somewhat slow though, and they also take up memory.

I added a postprocessing step to write each individual image to .png and each individual mask to .npy. I also modified the PanNukeDataset as needed. Tissue type and fold labels are in the filenames.

This setup is more intuitive and hopefully will be faster since we avoid loading the big .npy arrays during training.

I also added tests

… load one at a time instead of loading entire fold into single np array. Also add some tests.
Copy link
Contributor

@ryanccarelli ryanccarelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice change, good to free up memory

pathml/datasets/pannuke.py Outdated Show resolved Hide resolved
pathml/datasets/pannuke.py Show resolved Hide resolved
pathml/datasets/pannuke.py Show resolved Hide resolved
@jacob-rosenthal jacob-rosenthal merged commit dfc1ea7 into dev Jan 4, 2021
@jacob-rosenthal jacob-rosenthal deleted the pannuke branch January 4, 2021 20:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants