Back to one liners

pydata · max-sixty · May 10, 2022 · May 3, 2022 · May 3, 2022 · May 8, 2022
commit 5fca7b0149753b801ee2e5668e0aa791e7cd67c8
diff --git a/doc/user-guide/dask.rst b/doc/user-guide/dask.rst
@@ -555,25 +555,20 @@ larger chunksizes.
 Optimization Tips
 -----------------
 
-With analysis pipelines involving both spatial subsetting and temporal resampling, Dask performance can become very slow or memory hungry in certain cases. Here are some optimization tips we have found through experience:
+With analysis pipelines involving both spatial subsetting and temporal resampling, Dask performance
+can become very slow or memory hungry in certain cases. Here are some optimization tips we have found
+through experience:
 
 1. Do your spatial and temporal indexing (e.g. ``.sel()`` or ``.isel()``) early in the pipeline, especially before calling ``resample()`` or ``groupby()``. Grouping and resampling triggers some computation on all the blocks, which in theory should commute with indexing, but this optimization hasn't been implemented in Dask yet. (See `Dask issue #746 <https://github.com/dask/dask/issues/746>`_). More generally, ``groupby()`` is a costly operation and does not (yet) perform well on datasets split across multiple files (see :pull:`5734` and linked discussions there).
 
 2. Save intermediate results to disk as a netCDF files (using ``to_netcdf()``) and then load them again with ``open_dataset()`` for further computations. For example, if subtracting temporal mean from a dataset, save the temporal mean to disk before subtracting. Again, in theory, Dask should be able to do the computation in a streaming fashion, but in practice this is a fail case for the Dask scheduler, because it tries to keep every chunk of an array that it computes in memory. (See `Dask issue #874 <https://github.com/dask/dask/issues/874>`_)
 
-3. Specify smaller chunks across space when using :py:meth:`~xarray.open_mfdataset`
-(e.g., ``chunks={'latitude': 10, 'longitude': 10}``). This makes spatial subsetting easier,
-because there's no risk you will load subsets of data which span multiple chunks. On individual
-files, prefer to subset before chunking (suggestion 1).
+3. Specify smaller chunks across space when using :py:meth:`~xarray.open_mfdataset` (e.g., ``chunks={'latitude': 10, 'longitude': 10}``). This makes spatial subsetting easier, because there's no risk you will load subsets of data which span multiple chunks. On individual files, prefer to subset before chunking (suggestion 1).
 
-4. Chunk as early as possible, and avoid rechunking as much as possible. Always
-pass the ``chunks={}`` argument to :py:func:`~xarray.open_mfdataset` to avoid
-redundant file reads.
+4. Chunk as early as possible, and avoid rechunking as much as possible. Always pass the ``chunks={}`` argument to :py:func:`~xarray.open_mfdataset` to avoid redundant file reads.
 
-5. Using the h5netcdf package by passing ``engine='h5netcdf'`` to :py:meth:`~xarray.open_mfdataset`
- can be quicker than the default ``engine='netcdf4'`` that uses the netCDF4 package.
+5. Using the h5netcdf package by passing ``engine='h5netcdf'`` to :py:meth:`~xarray.open_mfdataset` can be quicker than the default ``engine='netcdf4'`` that uses the netCDF4 package.
 
 6. Some dask-specific tips may be found `here <https://docs.dask.org/en/latest/array-best-practices.html>`_.
 
-7. The dask `diagnostics <https://docs.dask.org/en/latest/understanding-performance.html>`_ can be
- useful in identifying performance bottlenecks.
+7. The dask `diagnostics <https://docs.dask.org/en/latest/understanding-performance.html>`_ can be useful in identifying performance bottlenecks.