Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xr.open_zarr is 3x slower than zarr.open, even at scale #9111

Closed
max-sixty opened this issue Jun 12, 2024 · 7 comments
Closed

xr.open_zarr is 3x slower than zarr.open, even at scale #9111

max-sixty opened this issue Jun 12, 2024 · 7 comments
Labels
topic-performance topic-zarr Related to zarr storage library

Comments

@max-sixty
Copy link
Collaborator

max-sixty commented Jun 12, 2024

What is your issue?

I'm doing some benchmarks on Xarray + Zarr vs. some other formats, and I get quite a surprising result — in a very simple array, xarray is adding a lot of overhead to reading a Zarr array.

Here's a quick script — super simple, just a single chunk. It's 800MB of data — so not some tiny array where reading a metadata json file or allocating an index is going to throw the results.

import numpy as np
import zarr
import xarray as xr
import dask
print(zarr.__version__, xr.__version__, dask.__version__)

(
    xr.DataArray(np.random.rand(10000, 10000), name="foo")
    .to_dataset()
    .chunk(None)
    .to_zarr("test.zarr", mode="w")
)

%timeit xr.open_zarr("test.zarr").compute()
%timeit zarr.open("test.zarr")["foo"][:]
2.17.2 2024.5.1.dev37+gce196d56 2024.5.2
551 ms ± 15 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
183 ms ± 2.93 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

So:

  • 551ms for xarray
  • 183ms for zarr

Having a quick look with py-spy suggests there might be some thread contention, but not sure how much is really contention vs. idle threads waiting.


Making the array 10x bigger (with 10 chunks) reduces the relative difference, but it's still fairly large:

2.17.2 2024.5.1.dev37+gce196d56 2024.5.2
6.88 s ± 353 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
4.15 s ± 264 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Any thoughts on what might be happening? Is the benchmark at least correct?

@max-sixty max-sixty added needs triage Issue that has not been reviewed by xarray team member topic-performance topic-zarr Related to zarr storage library labels Jun 12, 2024
@jhamman
Copy link
Member

jhamman commented Jun 12, 2024

Thanks for opening this issue @max-sixty. Would you mind splitting the timing into a the following two steps:

%timeit ds = xr.open_zarr("test.zarr")["foo"]
%timeit ds.compute()
%timeit arr = zarr.open("test.zarr")["foo"]
%timeit arr[:]

I would like to know if the difference is in the metadata loading step or in the fetching/decoding of chunks. I may also suggest avoiding dask chunks in the xarray case just to narrow things down.

@max-sixty max-sixty removed the needs triage Issue that has not been reviewed by xarray team member label Jun 13, 2024
@max-sixty
Copy link
Collaborator Author

max-sixty commented Jun 13, 2024

OK, no great updates with those unfortunately.


Slightly adjusting to:

da = xr.open_zarr("test.zarr")["foo"]
arr = zarr.open("test.zarr")["foo"]

%timeit xr.open_zarr("test.zarr")["foo"]
%timeit da.compute()
%timeit zarr.open("test.zarr")["foo"]
%timeit arr[:]

...shows there's basically nothing in the metadata step — the array is hopefully big enough to amortize that:

2.17.2 2024.5.1.dev37+gce196d56 2024.5.2
402 µs ± 9.98 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
575 ms ± 22.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
89.5 µs ± 1.72 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
191 ms ± 14.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

I may also suggest avoiding dask chunks in the xarray case just to narrow things down.

If we adjust to avoiding dask + chunks, we get the same result (assuming this is what you meant, lmk if not, probably this isn't the simplest way?):

da = xr.DataArray(np.random.rand(10000, 10000), name="foo")
da.encoding = dict(
    chunks=(10000, 10000), preferred_chunks={"dim_0": 10000, "dim_1": 10000}
)
da.to_dataset().to_zarr("test.zarr", mode="w")

%timeit xr.open_zarr("test.zarr")["foo"].compute()
%timeit zarr.open("test.zarr")["foo"][:]
2.17.2 2024.5.1.dev37+gce196d56 2024.5.2
583 ms ± 23.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
198 ms ± 16.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

@jhamman
Copy link
Member

jhamman commented Jun 13, 2024

Curious, I'm seeing quite similar (comparable between xarray and zarr) results but with slightly different versions. I'll update this post when I've used your exact set of versions. UPDATE: I've run this with a bunch of xarray/zarr/dask versions and I'm not seeing the differences you were seeing above 🤷 .

da = xr.open_zarr("test.zarr", chunks=None)["foo"]  # no dask, just a lazy backend array
arr = zarr.open("test.zarr")["foo"]

%timeit xr.open_zarr("test.zarr")["foo"]
%timeit da.compute()
%timeit zarr.open("test.zarr")["foo"]
%timeit arr[:]
2.18.0 2024.5.0 2024.5.1
1.01 ms ± 7.07 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
704 ms ± 8.76 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
129 µs ± 1.16 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
705 ms ± 11.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

2.17.2 2024.5.0 2024.5.1
1.01 ms ± 18.7 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
754 ms ± 117 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
130 µs ± 557 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
702 ms ± 5.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

2.17.2 2024.5.1.dev37+gce196d56 2024.5.1
971 µs ± 9.56 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
698 ms ± 4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
129 µs ± 894 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
700 ms ± 7.15 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

2.17.2 2024.5.1.dev37+gce196d56 2024.5.2
971 µs ± 8.1 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
700 ms ± 8.76 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
136 µs ± 14.3 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
704 ms ± 6.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

@max-sixty
Copy link
Collaborator Author

max-sixty commented Jun 13, 2024

OK so it depends on how we do the chunks:

import numpy as np
import zarr
import xarray as xr
import dask

print(zarr.__version__, xr.__version__, dask.__version__)

da = xr.DataArray(np.random.rand(10000, 10000), name="foo")
da.to_dataset().to_zarr("default-chunks.zarr", mode="w")
da.to_dataset().chunk(None).to_zarr("no-chunks.zarr", mode="w")

for chunk_type in ["no-chunks", "default-chunks"]:
    print(f"\n# {chunk_type}")

    print("zarr read")
    %timeit zarr.open(f"{chunk_type}.zarr")["foo"][:]

    print("xarray default read")
    %timeit xr.open_zarr(f"{chunk_type}.zarr")["foo"].compute()

    print("xarray chunk=None read")
    %timeit xr.open_zarr(f"{chunk_type}.zarr", chunks=None)["foo"].compute()
2.18.2 2024.5.1.dev37+gce196d56 2024.5.2

# no-chunks
zarr read
225 ms ± 13.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
xarray default read
568 ms ± 19.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
xarray chunks=None read
183 ms ± 1.27 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

# default-chunks
zarr read
729 ms ± 39.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
xarray default read
357 ms ± 13.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
xarray chunks=None read
769 ms ± 34.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Conclusion:

  • If write without chunks and read with chunks=None then xarray is fast — that's the thing to do in my benchmark
  • If write without chunks and read with the default, then xarray is much slower — that's the big anomaly here — 568ms vs. 225ms

@jhamman
Copy link
Member

jhamman commented Jun 13, 2024

This seems like part of the explanation then: When you don't specify chunks w/ zarr, you get default chunking (512 fairly small chunks in this case).

image image

Is it possible that your xarray default read for no-chunks is paying an overhead for using Dask but getting no parallelism to benefit because there is only one chunk to process?

@max-sixty
Copy link
Collaborator Author

max-sixty commented Jun 13, 2024

Is it possible that your xarray default read for no-chunks is paying an overhead for using Dask but getting no parallelism to benefit because there is only one chunk to process?

Yes, this very likely. But this seems like a lot of overhead — ~2x even on a huge array. Here's an array 10x the size with 10 chunks; it's 6.25s vs 3.12s. And this is with 10 chunks, so it should get some credit for the parallelization.

import numpy as np
import zarr
import xarray as xr
import dask
print(zarr.__version__, xr.__version__, dask.__version__)

(
    xr.DataArray(np.random.rand(100000, 10000), name="foo")
    .to_dataset()
    .chunk(10000)
    .to_zarr("test.zarr", mode="w")
)

%timeit -n 1 -r 1 xr.open_zarr("test.zarr", chunks={}).compute()
%timeit -n 1 -r 1 xr.open_zarr("test.zarr", chunks=None).compute()
%timeit -n 1 -r 1 xr.open_zarr("test.zarr").compute()
%timeit -n 1 -r 1 zarr.open("test.zarr")["foo"][:]
2.18.2 2024.5.1.dev37+gce196d56 2024.5.2
7.72 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
3.36 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
6.25 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
3.12 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)

...that said, possibly this is a dask issue?

max-sixty added a commit to max-sixty/xarray that referenced this issue Jun 19, 2024
Makes them more structure, consistent. I think removes a mistake re the default chunks arg in `open_zarr` (it's not `None`, it's `auto`).

Adds a comment re performance with `chunks=None`, closing pydata#9111
max-sixty added a commit that referenced this issue Jun 22, 2024
* Improve zarr chunks docs

Makes them more structure, consistent. I think removes a mistake re the default chunks arg in `open_zarr` (it's not `None`, it's `auto`).

Adds a comment re performance with `chunks=None`, closing #9111
@max-sixty
Copy link
Collaborator Author

Updated docs so will close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-performance topic-zarr Related to zarr storage library
Projects
None yet
Development

No branches or pull requests

2 participants