-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor slidedata #72
Conversation
pathml/core/slide_data.py
Outdated
assert isinstance(slide, Slide), f"slide is of type {type(slide)} but must be a subclass of pathml.core.slide.Slide" | ||
self.slide = slide | ||
self._slidetype = type(slide) | ||
self.name = None if slide is None else slide.name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't every SlideData have a name?
ALso how would we get a SlideData object with no slide?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. We need to decide if we will allow SlideData with only tiles?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. My first thought is no - if there is a collection of tiles, it could just be a Dataset class?
if self.slide.backend == 'openslide': | ||
if level == None: | ||
# TODO: is this the right default for openslide? | ||
level = 1 | ||
j, i = self.slide.level_dimensions[level] | ||
|
||
if stride is None: | ||
stride_i = shape[0] | ||
stride_j = shape[1] | ||
|
||
n_chunk_i = (i-shape[0])// stride_i +1 | ||
n_chunk_j = (j-shape[1])// stride_i +1 | ||
|
||
if pad: | ||
n_chunk_i = i // stride_i +1 | ||
n_chunk_j = j // stride_j +1 | ||
|
||
for ix_i in range(n_chunk_i): | ||
for ix_j in range(n_chunk_j): | ||
|
||
region = self.slide.read_region( | ||
location = (int(ix_j * stride_j), int(ix_i * stride_i)), | ||
level = level, size = (shape[0], shape[1]) | ||
) | ||
region_rgb = pil_to_rgb(region) | ||
coords = (ix_i, ix_j) | ||
if self.masks is not None: | ||
# TODO: test this line | ||
masks_chunk = self.masks.slice([int(ix_j*stride_j):int(ix_j*stride_j)+size,int(ix_i*stride_i):int(ix_i*stride_i)+size, ...]) | ||
yield Tile(region_rgb, masks_chunk, coords) | ||
|
||
elif self.slide.backend == 'bioformats': | ||
# TODO: this is complicated because need to handle both chunking, allocating different 2GB java arrays, and managing java heap | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the exact implementation will be different depending on the Slide backend, would it be cleaner to handle that logic in the Slide class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we move the whole generate_tiles
method to the Slide class? I think that would be logical. Pseudocode of what I'm thinking would happen when you run a pipeline:
def run(self, pipeline):
for tile in self.slide.generate_tiles(...):
tile_processed = pipeline(tile)
self.tiles.add(key, tile_processed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! A few design-level choices we have to make but the code is good
closes #71
TODO: