Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

package xarray and xarray-core in conda-forge #9149

Open
dcherian opened this issue Jun 20, 2024 · 5 comments
Open

package xarray and xarray-core in conda-forge #9149

dcherian opened this issue Jun 20, 2024 · 5 comments
Labels
dependencies Pull requests that update a dependency file enhancement

Comments

@dcherian
Copy link
Contributor

dcherian commented Jun 20, 2024

What is your issue?

The current set of Xarray dependencies is very minimal.

xarray/pyproject.toml

Lines 25 to 29 in 3fd162e

dependencies = [
"numpy>=1.23",
"packaging>=23.1",
"pandas>=2.0",
]

This is pretty unfriendly to a new user, and not a great out-of-the-box experience. You can't read any files (except npz, csv, parquet I guess), you can't access any tutorial datasets, you can't make plots, and you're missing a bunch of effectively free performance optimizations.

I think the current set of minimal dependencies is more appropriate to an xarray-core package.
Here are our optional dependencies for example:

xarray/pyproject.toml

Lines 31 to 48 in 3fd162e

[project.optional-dependencies]
accel = ["scipy", "bottleneck", "numbagg", "flox", "opt_einsum"]
complete = ["xarray[accel,io,parallel,viz,dev]"]
dev = [
"hypothesis",
"mypy",
"pre-commit",
"pytest",
"pytest-cov",
"pytest-env",
"pytest-xdist",
"pytest-timeout",
"ruff",
"xarray[complete]",
]
io = ["netCDF4", "h5netcdf", "scipy", 'pydap; python_version<"3.10"', "zarr", "fsspec", "cftime", "pooch"]
parallel = ["dask[complete]"]
viz = ["matplotlib", "seaborn", "nc-time-axis"]

Proposal

I suggest that we migrate to xarray-core and xarray packages in conda-forge.:

  1. xarray-core will have the current set of minimal dependencies.
  2. For xarray I propose the following dependencies:
    1. flox, opt_einsum, numbagg for accelerated computations
    2. fsspec, netcdf, zarr for reading common datasets & "cloud"
    3. matplotlib for plotting.
    4. pooch to read tutorial datasets

Related: dask packages dask-core and I think matplotlib packages matplotlib-base

@dcherian dcherian added needs triage Issue that has not been reviewed by xarray team member enhancement dependencies Pull requests that update a dependency file and removed needs triage Issue that has not been reviewed by xarray team member labels Jun 20, 2024
@dcherian
Copy link
Contributor Author

Note that there are many user survey comments asking for performance improvements

  • "whatever can speed up computations would be welcomed"
  • Optimizations, especially for "resample" and "rolling".
  • faster computation
  • Faster/smaller dask graph parallelizations?
  • faster, less overhead
  • "also, can we fix the fact that groupby() on a dimension with only one chunk returns something with a chunk size of one on that dimension? It produces huge graph sizes."

And then this counter-example 🤷🏾‍♂️ : "lightweight version without heavy dependencies"

@max-sixty
Copy link
Collaborator

(Yes, I thought I asked something similar a while ago around pooch but can't find it)

Ideally we would allow {name="xarray", default-features=false} for the minority of users that want the slim version. But IIUC python doesn't have any notion of "default but not required dependencies".

  • So +1 on xarray-core vs xarray in that case
  • Another option would be encouraging xarray[standard], but that doesn't seem like a common thing in python either

@dcherian
Copy link
Contributor Author

Looking at the 👍🏾 on the first post we seem to have general agreement.

How shall we proceed here? We could:

  1. Begin with an announcement
  2. Start distributing xarray-core
  3. Change the recipe for xarray in two months?

@shoyer it'd be good to get your vote here.

@shoyer
Copy link
Member

shoyer commented Jul 11, 2024 via email

@dcherian
Copy link
Contributor Author

@pydata/xarray can someone volunteer to take this on please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file enhancement
Projects
None yet
Development

No branches or pull requests

3 participants