Blosc Decompression Fails when chunk size is greater than the shape #228

sameeul · 2024-06-04T16:42:22Z

I have a dataset where the chunk size is [1,1,1,1024,1024] but the actual shape of the data is [1,1,1,256,256]. I read the zarr-spec and could not find anything that makes this an invalid setup. My .zarray file looks the following:

{
    "chunks": [
        1,
        1,
        1,
        1024,
        1024
    ],
    "compressor": {
        "blocksize": 0,
        "clevel": 1,
        "cname": "zstd",
        "id": "blosc",
        "shuffle": 1
    },
    "dtype": "<f8",
    "fill_value": 0.0,
    "filters": null,
    "order": "C",
    "shape": [
        1,
        1,
        1,
        256,
        256
    ],
    "zarr_format": 2

When trying to read this file using z5py package, the following error happens:

Traceback (most recent call last):
  File "/mnt/hdd2/axle/dev/z5/test.py", line 8, in <module>
    print(np.sum(ds[:]))
  File "/home/samee/bin/miniforge3/envs/z5test/lib/python3.10/site-packages/z5py/dataset.py", line 363, in __getitem__
    _z5py.read_subarray(self._impl,
RuntimeError: Blosc decompression failed

I debugged the code and found that, when we call blosc_decompress_ctx in z5, it returns -1. Further investigation inside the blosc.c tells me that it fails due to this check:

  /* Check that we have enough space to decompress */
  if (context->sourcesize > (int32_t)destsize) {
    return -1;
  }

It looks like, even though the actual destsize is 256*256*8 bytes, it is expecting it to be 1024*1024*8 bytes (chunk shape being larger than the actual shape).

At this point, I am not sure, if there is a way to trick blosc to accept this and do the decompressing.
Just to add some context, zarr-python and tensorstore seems to deal with it without this error and decompress correctly.

I am not sure if there is an easy solution to this, but wanted to report this behavior.

The text was updated successfully, but these errors were encountered:

constantinpape · 2024-06-04T18:38:03Z

Thanks for reporting. Unfortunately I don't think there is much we can do about this, since the error is raised in blosc.
The only thing I can think of is changing the sizeOut parameter here, so that it matches the actual size of the chunk.
Then the decompression could work. But I am not sure if you have enough information to find this size programaticaly.

sameeul · 2024-09-13T09:49:32Z

I have looked into it in more detail and this particular code block seems to be the issue:

z5/include/z5/metadata.hxx

Line 255 in f118d95

 // if the chunk-shape is bigger than the shape in any dimension, we set it to the shape 

            // if the chunk-shape is bigger than the shape in any dimension, we set it to the shape
            for(unsigned d = 0; d < shape.size(); ++d) {
                if(chunkShape[d] > shape[d]) {
                    chunkShape[d] = shape[d];
                }
            }

Is there any reason for this particular modification of the chunkShape?
I commented out this code and it seems to resolve my aforementioned issue.

sameeul mentioned this issue Sep 13, 2024

Do not update chunk shape if actual shape is smaller #232

Merged

constantinpape closed this as completed Sep 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Blosc Decompression Fails when chunk size is greater than the shape #228

Blosc Decompression Fails when chunk size is greater than the shape #228

sameeul commented Jun 4, 2024

constantinpape commented Jun 4, 2024

sameeul commented Sep 13, 2024

Blosc Decompression Fails when chunk size is greater than the shape #228

Blosc Decompression Fails when chunk size is greater than the shape #228

Comments

sameeul commented Jun 4, 2024

constantinpape commented Jun 4, 2024

sameeul commented Sep 13, 2024