refactor slidedata #72

ryanccarelli · 2021-01-16T22:11:13Z

closes #71

TODO:

chunking for multiparametric
plot
save

pathml/core/slide_data.py

jacob-rosenthal · 2021-01-17T03:55:55Z

pathml/core/slide_data.py

+ assert isinstance(slide, Slide), f"slide is of type {type(slide)} but must be a subclass of pathml.core.slide.Slide"
+ self.slide = slide
+ self._slidetype = type(slide)
+ self.name = None if slide is None else slide.name


Shouldn't every SlideData have a name?
ALso how would we get a SlideData object with no slide?

Agreed. We need to decide if we will allow SlideData with only tiles?

Hmm. My first thought is no - if there is a collection of tiles, it could just be a Dataset class?

pathml/core/slide_data.py

jacob-rosenthal · 2021-01-20T17:22:34Z

pathml/core/slide_data.py

+ if self.slide.backend == 'openslide': 
+ if level == None:
+ # TODO: is this the right default for openslide?
+ level = 1
+ j, i = self.slide.level_dimensions[level]
+
+ if stride is None:
+ stride_i = shape[0]
+ stride_j = shape[1]
+
+ n_chunk_i = (i-shape[0])// stride_i +1
+ n_chunk_j = (j-shape[1])// stride_i +1
+
+ if pad:
+ n_chunk_i = i // stride_i +1
+ n_chunk_j = j // stride_j +1
+
+ for ix_i in range(n_chunk_i):
+ for ix_j in range(n_chunk_j):
+
+ region = self.slide.read_region(
+ location = (int(ix_j * stride_j), int(ix_i * stride_i)),
+ level = level, size = (shape[0], shape[1])
+ )
+ region_rgb = pil_to_rgb(region)
+ coords = (ix_i, ix_j)
+ if self.masks is not None:
+ # TODO: test this line
+ masks_chunk = self.masks.slice([int(ix_j*stride_j):int(ix_j*stride_j)+size,int(ix_i*stride_i):int(ix_i*stride_i)+size, ...])
+ yield Tile(region_rgb, masks_chunk, coords)
+
+ elif self.slide.backend == 'bioformats':
+ # TODO: this is complicated because need to handle both chunking, allocating different 2GB java arrays, and managing java heap 
+ pass


Since the exact implementation will be different depending on the Slide backend, would it be cleaner to handle that logic in the Slide class?

Should we move the whole generate_tiles method to the Slide class? I think that would be logical. Pseudocode of what I'm thinking would happen when you run a pipeline:

def run(self, pipeline): for tile in self.slide.generate_tiles(...): tile_processed = pipeline(tile) self.tiles.add(key, tile_processed)

pathml/core/slide_data.py

jacob-rosenthal

Looks good! A few design-level choices we have to make but the code is good

ryanccarelli added 4 commits January 16, 2021 12:25

slide_data docstring

6718575

slide_data init and repr

ce6339a

init slide_data run, chunks

7fc566e

chunks for heslide

bd6a8d3

jacob-rosenthal reviewed Jan 17, 2021

View reviewed changes

pathml/core/slide_data.py Outdated Show resolved Hide resolved