Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blosc Decompression Fails when chunk size is greater than the shape #228

Closed
sameeul opened this issue Jun 4, 2024 · 2 comments
Closed

Comments

@sameeul
Copy link
Contributor

sameeul commented Jun 4, 2024

I have a dataset where the chunk size is [1,1,1,1024,1024] but the actual shape of the data is [1,1,1,256,256]. I read the zarr-spec and could not find anything that makes this an invalid setup. My .zarray file looks the following:

{
    "chunks": [
        1,
        1,
        1,
        1024,
        1024
    ],
    "compressor": {
        "blocksize": 0,
        "clevel": 1,
        "cname": "zstd",
        "id": "blosc",
        "shuffle": 1
    },
    "dtype": "<f8",
    "fill_value": 0.0,
    "filters": null,
    "order": "C",
    "shape": [
        1,
        1,
        1,
        256,
        256
    ],
    "zarr_format": 2

When trying to read this file using z5py package, the following error happens:

Traceback (most recent call last):
  File "/mnt/hdd2/axle/dev/z5/test.py", line 8, in <module>
    print(np.sum(ds[:]))
  File "/home/samee/bin/miniforge3/envs/z5test/lib/python3.10/site-packages/z5py/dataset.py", line 363, in __getitem__
    _z5py.read_subarray(self._impl,
RuntimeError: Blosc decompression failed

I debugged the code and found that, when we call blosc_decompress_ctx in z5, it returns -1. Further investigation inside the blosc.c tells me that it fails due to this check:

  /* Check that we have enough space to decompress */
  if (context->sourcesize > (int32_t)destsize) {
    return -1;
  }

It looks like, even though the actual destsize is 256*256*8 bytes, it is expecting it to be 1024*1024*8 bytes (chunk shape being larger than the actual shape).

At this point, I am not sure, if there is a way to trick blosc to accept this and do the decompressing.
Just to add some context, zarr-python and tensorstore seems to deal with it without this error and decompress correctly.

I am not sure if there is an easy solution to this, but wanted to report this behavior.

@constantinpape
Copy link
Owner

Thanks for reporting. Unfortunately I don't think there is much we can do about this, since the error is raised in blosc.
The only thing I can think of is changing the sizeOut parameter here, so that it matches the actual size of the chunk.
Then the decompression could work. But I am not sure if you have enough information to find this size programaticaly.

@sameeul
Copy link
Contributor Author

sameeul commented Sep 13, 2024

I have looked into it in more detail and this particular code block seems to be the issue:

// if the chunk-shape is bigger than the shape in any dimension, we set it to the shape

            // if the chunk-shape is bigger than the shape in any dimension, we set it to the shape
            for(unsigned d = 0; d < shape.size(); ++d) {
                if(chunkShape[d] > shape[d]) {
                    chunkShape[d] = shape[d];
                }
            }

Is there any reason for this particular modification of the chunkShape?
I commented out this code and it seems to resolve my aforementioned issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants