[Doc]: Calculating Daily Climatology and Departures from Monthly Time Series needs to use daily input #303

msahn · 2022-08-09T22:28:07Z

Describe your documentation update

https://xcdat.readthedocs.io/en/v0.3.0/examples/climatology-and-departures.html#2.4-Daily-Cycle

I think the example calculation of daily climatology did not work properly because the input data was monthly data. It is expected that the output time steps are 365 or 366, but the example output time steps are 12.

msahn · 2022-08-09T22:39:00Z

https://xcdat.readthedocs.io/en/v0.3.0/examples/climatology-and-departures.html#3.4.-Daily-Cycle

I think this part should also be modified using daily input data.

lee1043 · 2022-08-09T22:49:45Z

@msahn Nice catch! Agreed.

chengzhuzhang · 2022-08-10T17:32:58Z

Thank you for testing and catching this problem @msahn. @lee1043 I'm not sure if the docs on temporal averaging has been reviewed thoroughly, would you please help fix the docs here?

tomvothecoder · 2022-08-10T18:19:30Z

Thank you for reporting this issue @msahn.

Thank you for testing and catching this problem @msahn. @lee1043 I'm not sure if the docs on temporal averaging has been reviewed thoroughly, would you please help fix the docs here?

Yeah, we did a general review of these notebooks so it would be helpful if we had a more thorough review. If @lee1043 can take a look, that would be great.

lee1043 · 2022-08-10T19:09:41Z

@tomvothecoder @chengzhuzhang sure, I will take this a look.

lee1043 · 2022-08-17T05:19:21Z

I am in a motion of experimenting loading daily dataset but getting below error, @tomvothecoder any chance this look familiar to you?

Loading daily netcdf file:

filepath2 = "http:https://esgf.nci.org.au/thredds/dodsC/master/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/historical/r1i1p1f1/day/tas/gn/v20191115/tas_day_ACCESS-ESM1-5_historical_r1i1p1f1_gn_20000101-20141231.nc"
ds2 = xcdat.open_dataset(filepath2)

# Unit adjust (-273.15, K to C)
ds2["tas"] = ds2.tas - 273.15

ds2

Error message:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Input In [7], in <cell line: 2>()
      1 # Unit adjust (-273.15, K to C)
----> 2 ds2["tas"] = ds2.tas - 273.15
      4 ds2

File /opt/anaconda3/envs/xcdat_dev_20220816/lib/python3.9/site-packages/xarray/core/_typed_ops.py:209, in DataArrayOpsMixin.__sub__(self, other)
    208 def __sub__(self, other):
--> 209     return self._binary_op(other, operator.sub)

File /opt/anaconda3/envs/xcdat_dev_20220816/lib/python3.9/site-packages/xarray/core/dataarray.py:3098, in DataArray._binary_op(self, other, f, reflexive)
   3094 other_variable = getattr(other, "variable", other)
   3095 other_coords = getattr(other, "coords", None)
   3097 variable = (
-> 3098     f(self.variable, other_variable)
   3099     if not reflexive
   3100     else f(other_variable, self.variable)
   3101 )
   3102 coords, indexes = self.coords._merge_raw(other_coords, reflexive)
   3103 name = self._result_name(other)

File /opt/anaconda3/envs/xcdat_dev_20220816/lib/python3.9/site-packages/xarray/core/_typed_ops.py:399, in VariableOpsMixin.__sub__(self, other)
    398 def __sub__(self, other):
--> 399     return self._binary_op(other, operator.sub)

File /opt/anaconda3/envs/xcdat_dev_20220816/lib/python3.9/site-packages/xarray/core/variable.py:2462, in Variable._binary_op(self, other, f, reflexive)
   2460     other_data, self_data, dims = _broadcast_compat_data(other, self)
   2461 else:
-> 2462     self_data, other_data, dims = _broadcast_compat_data(self, other)
   2463 keep_attrs = _get_keep_attrs(default=False)
   2464 attrs = self._attrs if keep_attrs else None

File /opt/anaconda3/envs/xcdat_dev_20220816/lib/python3.9/site-packages/xarray/core/variable.py:2941, in _broadcast_compat_data(self, other)
   2938     dims = new_self.dims
   2939 else:
   2940     # rely on numpy broadcasting rules
-> 2941     self_data = self.data
   2942     other_data = other
   2943     dims = self.dims

File /opt/anaconda3/envs/xcdat_dev_20220816/lib/python3.9/site-packages/xarray/core/variable.py:339, in Variable.data(self)
    337     return self._data
    338 else:
--> 339     return self.values

File /opt/anaconda3/envs/xcdat_dev_20220816/lib/python3.9/site-packages/xarray/core/variable.py:512, in Variable.values(self)
    509 @property
    510 def values(self):
    511     """The variable's data as a numpy.ndarray"""
--> 512     return _as_array_or_item(self._data)

File /opt/anaconda3/envs/xcdat_dev_20220816/lib/python3.9/site-packages/xarray/core/variable.py:252, in _as_array_or_item(data)
    238 def _as_array_or_item(data):
    239     """Return the given values as a numpy array, or as an individual item if
    240     it's a 0d datetime64 or timedelta64 array.
    241 
   (...)
    250     TODO: remove this (replace with np.asarray) once these issues are fixed
    251     """
--> 252     data = np.asarray(data)
    253     if data.ndim == 0:
    254         if data.dtype.kind == "M":

File /opt/anaconda3/envs/xcdat_dev_20220816/lib/python3.9/site-packages/xarray/core/indexing.py:552, in MemoryCachedArray.__array__(self, dtype)
    551 def __array__(self, dtype=None):
--> 552     self._ensure_cached()
    553     return np.asarray(self.array, dtype=dtype)

File /opt/anaconda3/envs/xcdat_dev_20220816/lib/python3.9/site-packages/xarray/core/indexing.py:549, in MemoryCachedArray._ensure_cached(self)
    547 def _ensure_cached(self):
    548     if not isinstance(self.array, NumpyIndexingAdapter):
--> 549         self.array = NumpyIndexingAdapter(np.asarray(self.array))

File /opt/anaconda3/envs/xcdat_dev_20220816/lib/python3.9/site-packages/xarray/core/indexing.py:522, in CopyOnWriteArray.__array__(self, dtype)
    521 def __array__(self, dtype=None):
--> 522     return np.asarray(self.array, dtype=dtype)

File /opt/anaconda3/envs/xcdat_dev_20220816/lib/python3.9/site-packages/xarray/core/indexing.py:423, in LazilyIndexedArray.__array__(self, dtype)
    421 def __array__(self, dtype=None):
    422     array = as_indexable(self.array)
--> 423     return np.asarray(array[self.key], dtype=None)

File /opt/anaconda3/envs/xcdat_dev_20220816/lib/python3.9/site-packages/xarray/coding/variables.py:70, in _ElementwiseFunctionArray.__array__(self, dtype)
     69 def __array__(self, dtype=None):
---> 70     return self.func(self.array)

File /opt/anaconda3/envs/xcdat_dev_20220816/lib/python3.9/site-packages/xarray/coding/variables.py:137, in _apply_mask(data, encoded_fill_values, decoded_fill_value, dtype)
    133 def _apply_mask(
    134     data: np.ndarray, encoded_fill_values: list, decoded_fill_value: Any, dtype: Any
    135 ) -> np.ndarray:
    136     """Mask all matching values in a NumPy arrays."""
--> 137     data = np.asarray(data, dtype=dtype)
    138     condition = False
    139     for fv in encoded_fill_values:

File /opt/anaconda3/envs/xcdat_dev_20220816/lib/python3.9/site-packages/xarray/core/indexing.py:423, in LazilyIndexedArray.__array__(self, dtype)
    421 def __array__(self, dtype=None):
    422     array = as_indexable(self.array)
--> 423     return np.asarray(array[self.key], dtype=None)

File /opt/anaconda3/envs/xcdat_dev_20220816/lib/python3.9/site-packages/xarray/backends/netCDF4_.py:93, in NetCDF4ArrayWrapper.__getitem__(self, key)
     92 def __getitem__(self, key):
---> 93     return indexing.explicit_indexing_adapter(
     94         key, self.shape, indexing.IndexingSupport.OUTER, self._getitem
     95     )

File /opt/anaconda3/envs/xcdat_dev_20220816/lib/python3.9/site-packages/xarray/core/indexing.py:712, in explicit_indexing_adapter(key, shape, indexing_support, raw_indexing_method)
    690 """Support explicit indexing by delegating to a raw indexing method.
    691 
    692 Outer and/or vectorized indexers are supported by indexing a second time
   (...)
    709 Indexing result, in the form of a duck numpy-array.
    710 """
    711 raw_key, numpy_indices = decompose_indexer(key, shape, indexing_support)
--> 712 result = raw_indexing_method(raw_key.tuple)
    713 if numpy_indices.tuple:
    714     # index the loaded np.ndarray
    715     result = NumpyIndexingAdapter(np.asarray(result))[numpy_indices]

File /opt/anaconda3/envs/xcdat_dev_20220816/lib/python3.9/site-packages/xarray/backends/netCDF4_.py:106, in NetCDF4ArrayWrapper._getitem(self, key)
    104     with self.datastore.lock:
    105         original_array = self.get_array(needs_lock=False)
--> 106         array = getitem(original_array, key)
    107 except IndexError:
    108     # Catch IndexError in netCDF4 and return a more informative
    109     # error message.  This is most often called when an unsorted
    110     # indexer is used before the data is loaded from disk.
    111     msg = (
    112         "The indexing operation you are attempting to perform "
    113         "is not valid on netCDF4.Variable object. Try loading "
    114         "your data into memory first by calling .load()."
    115     )

File /opt/anaconda3/envs/xcdat_dev_20220816/lib/python3.9/site-packages/xarray/backends/common.py:64, in robust_getitem(array, key, catch, max_retries, initial_delay)
     62 for n in range(max_retries + 1):
     63     try:
---> 64         return array[key]
     65     except catch:
     66         if n == max_retries:

File src/netCDF4/_netCDF4.pyx:4406, in netCDF4._netCDF4.Variable.__getitem__()

File src/netCDF4/_netCDF4.pyx:5350, in netCDF4._netCDF4.Variable._get()

File src/netCDF4/_netCDF4.pyx:1927, in netCDF4._netCDF4._ensure_nc_success()

RuntimeError: NetCDF: Authorization failure

pochedls · 2022-08-17T15:59:47Z

I don't think this issue is related to the unit conversion or xcdat: If you open the dataset with xarray (i.e., ds = xr.open_dataset(filepath2).load()) you get the same problem. You may need a password to open this dataset.

tomvothecoder · 2022-08-17T16:34:42Z

I am in a motion of experimenting loading daily dataset but getting below error, @tomvothecoder any chance this look familiar to you?

I don't think this issue is related to the unit conversion or xcdat: If you open the dataset with xarray (i.e., ds = xr.open_dataset(filepath2).load()) you get the same problem. You may need a password to open this dataset.

When dataset file sizes are larger than the default OPeNDAP setting (usually 500mb), you have to chunk to avoid being rate limited.

# The size of this file is approximately 1.45 GB, so we will be chunking our
# request using Dask to avoid hitting the OPeNDAP file size request limit for
# this ESGF node.
ds2 = xcdat.open_dataset(
    "http:https://esgf-data3.diasjp.net/thredds/dodsC/esg_dataroot/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/historical/r1i1p1f1/day/tas/gn/v20191115/tas_day_ACCESS-ESM1-5_historical_r1i1p1f1_gn_18500101-18991231.nc",
    chunks={"time": "auto"},
)

# Unit adjust (-273.15, K to C)
ds2["tas"] = ds2.tas - 273.15

ds2

Here's an explanation in another xCDAT notebook (scroll down to 3.3 Month Averages because the headers are broken): https://xcdat.readthedocs.io/en/latest/examples/temporal-average.html#3.3-Monthly-Averages-(freq=

tomvothecoder · 2022-08-17T16:44:19Z

#277 should be explored to replace the referenced datasets in the demos, since OPeNDAP is introducing overhead with needing to chunk for larger datasets.

lee1043 · 2022-08-17T22:56:43Z

@tomvothecoder @pochedls thanks for your comments. @tomvothecoder's above solution for adding chunks in the open_dataset solved the error.

After that step, I am getting another error that I suspect it is from leap year handling (see the last line of the error message, invalid day number provided in cftime.DatetimeProlepticGregorian(1, 2, 29, 0, 0, 0, 0, has_year_zero=True))

@msahn could you share us (or point to your code in PMP) for how you handle leap year when getting the daily climatology in your precipitation metric work?

>> daily_climo = ds2.temporal.climatology("tas", freq="day", weighted=True)

Error message:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [16], in <cell line: 1>()
----> 1 daily_climo = ds2.temporal.climatology("tas", freq="day", weighted=True)

File /opt/anaconda3/envs/xcdat_dev_20220816/lib/python3.9/site-packages/xcdat-0.3.0-py3.9.egg/xcdat/temporal.py:477, in TemporalAccessor.climatology(self, data_var, freq, weighted, keep_weights, season_config)
    348 def climatology(
    349     self,
    350     data_var: str,
   (...)
    354     season_config: SeasonConfigInput = DEFAULT_SEASON_CONFIG,
    355 ):
    356     """Returns a Dataset with the climatology of a data variable.
    357 
    358     Parameters
   (...)
    475     }
    476     """
--> 477     return self._averager(
    478         data_var, "climatology", freq, weighted, keep_weights, season_config
    479     )

File /opt/anaconda3/envs/xcdat_dev_20220816/lib/python3.9/site-packages/xcdat-0.3.0-py3.9.egg/xcdat/temporal.py:697, in TemporalAccessor._averager(self, data_var, mode, freq, weighted, keep_weights, season_config)
    695     dv = self._average(dv)
    696 elif self._mode in ["group_average", "climatology", "departures"]:
--> 697     dv = self._group_average(dv)
    699 # The original time dimension is dropped from the dataset because
    700 # it becomes obsolete after the data variable is averaged. When the
    701 # averaged data variable is added to the dataset, the new time dimension
    702 # and its associated coordinates are also added.
    703 ds = ds.drop_dims(self._dim)

File /opt/anaconda3/envs/xcdat_dev_20220816/lib/python3.9/site-packages/xcdat-0.3.0-py3.9.egg/xcdat/temporal.py:921, in TemporalAccessor._group_average(self, data_var)
    918 dv = data_var.copy()
    920 if self._weighted:
--> 921     self._weights = self._get_weights()
    922     dv *= self._weights
    923     dv = self._group_data(dv).sum()  # type: ignore

File /opt/anaconda3/envs/xcdat_dev_20220816/lib/python3.9/site-packages/xcdat-0.3.0-py3.9.egg/xcdat/temporal.py:990, in TemporalAccessor._get_weights(self)
    987     time_lengths.load()
    988 time_lengths = time_lengths.astype(np.float64)
--> 990 grouped_time_lengths = self._group_data(time_lengths)
    991 weights: xr.DataArray = grouped_time_lengths / grouped_time_lengths.sum()  # type: ignore
    992 weights.name = f"{self._dim}_wts"

File /opt/anaconda3/envs/xcdat_dev_20220816/lib/python3.9/site-packages/xcdat-0.3.0-py3.9.egg/xcdat/temporal.py:1022, in TemporalAccessor._group_data(self, data_var)
   1020     dv_gb = dv.groupby(f"{self._dim}.{self._freq}")
   1021 else:
-> 1022     self._labeled_time = self._label_time_coords(dv[self._dim])
   1023     dv.coords[self._labeled_time.name] = self._labeled_time
   1024     dv_gb = dv.groupby(self._labeled_time.name)

File /opt/anaconda3/envs/xcdat_dev_20220816/lib/python3.9/site-packages/xcdat-0.3.0-py3.9.egg/xcdat/temporal.py:1073, in TemporalAccessor._label_time_coords(self, time_coords)
   1029 """Labels time coordinates with a group for grouping.
   1030 
   1031 This methods labels time coordinates for grouping by first extracting
   (...)
   1070 >>> * time     (time) datetime64[ns] 2000-01-01T00:00:00 ... 2000-04-01T00:00:00
   1071 """
   1072 df_dt_components: pd.DataFrame = self._get_df_dt_components(time_coords)
-> 1073 dt_objects = self._convert_df_to_dt(df_dt_components)
   1075 time_grouped = xr.DataArray(
   1076     name="_".join(df_dt_components.columns),
   1077     data=dt_objects,
   (...)
   1080     attrs=time_coords[self._dim].attrs,
   1081 )
   1082 time_grouped.encoding = time_coords[self._dim].encoding

File /opt/anaconda3/envs/xcdat_dev_20220816/lib/python3.9/site-packages/xcdat-0.3.0-py3.9.egg/xcdat/temporal.py:1360, in TemporalAccessor._convert_df_to_dt(self, df)
   1357     if component not in df_new.columns:
   1358         df_new[component] = default_val
-> 1360 dates = [
   1361     self.date_type(year, month, day, hour)
   1362     for year, month, day, hour in zip(
   1363         df_new.year, df_new.month, df_new.day, df_new.hour
   1364     )
   1365 ]
   1367 return np.array(dates)

File /opt/anaconda3/envs/xcdat_dev_20220816/lib/python3.9/site-packages/xcdat-0.3.0-py3.9.egg/xcdat/temporal.py:1361, in <listcomp>(.0)
   1357     if component not in df_new.columns:
   1358         df_new[component] = default_val
   1360 dates = [
-> 1361     self.date_type(year, month, day, hour)
   1362     for year, month, day, hour in zip(
   1363         df_new.year, df_new.month, df_new.day, df_new.hour
   1364     )
   1365 ]
   1367 return np.array(dates)

File src/cftime/_cftime.pyx:2016, in cftime._cftime.DatetimeProlepticGregorian.__init__()

File src/cftime/_cftime.pyx:1145, in cftime._cftime.datetime.__init__()

File src/cftime/_cftime.pyx:1647, in cftime._cftime.assert_valid_date()

ValueError: invalid day number provided in cftime.DatetimeProlepticGregorian(1, 2, 29, 0, 0, 0, 0, has_year_zero=True)

Follow up checking:

>> import cftime
>> cftime.DatetimeProlepticGregorian(0, 2, 29, 0, 0, 0, 0, has_year_zero=True)  # okay (leap year every 4th year starting 0th)

cftime.DatetimeProlepticGregorian(0, 2, 29, 0, 0, 0, 0, has_year_zero=True)

>> cftime.DatetimeProlepticGregorian(1, 2, 29, 0, 0, 0, 0, has_year_zero=True)  # Error (because no leap year in 1th year)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [27], in <cell line: 1>()
----> 1 cftime.DatetimeProlepticGregorian(1, 2, 29, 0, 0, 0, 0, has_year_zero=True)

File src/cftime/_cftime.pyx:2016, in cftime._cftime.DatetimeProlepticGregorian.__init__()

File src/cftime/_cftime.pyx:1145, in cftime._cftime.datetime.__init__()

File src/cftime/_cftime.pyx:1647, in cftime._cftime.assert_valid_date()

ValueError: invalid day number provided in cftime.DatetimeProlepticGregorian(1, 2, 29, 0, 0, 0, 0, has_year_zero=True)

lee1043 · 2022-08-17T23:40:16Z

@msahn I think I found your code in PMP that gets daily climatology. I think you are basically stretching non-leap year days to 366 and insert missing value at 2/29 so they can be excluded when calculating daily climatology, is that correct?

lee1043 · 2022-08-18T00:00:16Z

Following this, it looks like removing 2/29 of leap year prior to daily climatology calculation might be an easier way with xCDAT.

# Remove leap year prior to daily climatology
ds3 = ds2.sel(time=~((ds2.time.dt.month==2)&(ds2.time.dt.day==29)))
daily_climo = ds3.temporal.climatology("tas", freq="day", weighted=True)
daily_climo.tas.shape

(365, 145, 192)

I am not sure if this step needs to be included in the climatology function, or just having it noted in the document should be enough.

msahn · 2022-08-18T00:12:48Z

@msahn I think I found your code in PMP that gets daily climatology. I think you are basically stretching non-leap year days to 366 and insert missing value at 2/29 so they can be excluded when calculating daily climatology, is that correct?

Yes, that is correct. In this way, I could calculate 366-day climatology including 2/29.

lee1043 · 2022-08-31T19:11:34Z

Opened discussion at #332

msahn added the type: docs Updates to documentation label Aug 9, 2022

lee1043 self-assigned this Aug 10, 2022

lee1043 changed the title ~~[Doc]: Calculating Climatology and Departures from Monthly Time Series~~ [Doc]: Calculating Daily Climatology and Departures from Monthly Time Series needs to use daily input Aug 17, 2022

lee1043 mentioned this issue Aug 18, 2022

[Bug]: error on climatology calculation for custom seasons if after running for default seasons #325

Closed

tomvothecoder added the Priority: Medium label Aug 19, 2022

tomvothecoder mentioned this issue Sep 6, 2022

[Bug]: Drop leap years for high frequency input data when calculating climatology/departures #337

Closed

tomvothecoder added this to To do in v0.3.2 via automation Sep 8, 2022

tomvothecoder removed this from To do in v0.3.2 Sep 8, 2022

tomvothecoder added this to To do in Next Release - v0.3.3 via automation Sep 16, 2022

tomvothecoder moved this from To do to In progress in Next Release - v0.3.3 Sep 16, 2022

tomvothecoder mentioned this issue Sep 21, 2022

Drop leap days based on CF calendar type to calculate daily climatologies and departures #350

Merged

9 tasks

tomvothecoder closed this as completed in #350 Sep 26, 2022

Next Release - v0.3.3 automation moved this from In progress to Done Sep 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Doc]: Calculating Daily Climatology and Departures from Monthly Time Series needs to use daily input #303

[Doc]: Calculating Daily Climatology and Departures from Monthly Time Series needs to use daily input #303

msahn commented Aug 9, 2022

msahn commented Aug 9, 2022

lee1043 commented Aug 9, 2022

chengzhuzhang commented Aug 10, 2022

tomvothecoder commented Aug 10, 2022

lee1043 commented Aug 10, 2022

lee1043 commented Aug 17, 2022 •

edited

Loading

pochedls commented Aug 17, 2022

tomvothecoder commented Aug 17, 2022

tomvothecoder commented Aug 17, 2022

lee1043 commented Aug 17, 2022

lee1043 commented Aug 17, 2022 •

edited

Loading

lee1043 commented Aug 18, 2022 •

edited

Loading

msahn commented Aug 18, 2022

lee1043 commented Aug 31, 2022

[Doc]: Calculating Daily Climatology and Departures from Monthly Time Series needs to use daily input #303

[Doc]: Calculating Daily Climatology and Departures from Monthly Time Series needs to use daily input #303

Comments

msahn commented Aug 9, 2022

Describe your documentation update

msahn commented Aug 9, 2022

lee1043 commented Aug 9, 2022

chengzhuzhang commented Aug 10, 2022

tomvothecoder commented Aug 10, 2022

lee1043 commented Aug 10, 2022

lee1043 commented Aug 17, 2022 • edited Loading

pochedls commented Aug 17, 2022

tomvothecoder commented Aug 17, 2022

tomvothecoder commented Aug 17, 2022

lee1043 commented Aug 17, 2022

lee1043 commented Aug 17, 2022 • edited Loading

lee1043 commented Aug 18, 2022 • edited Loading

msahn commented Aug 18, 2022

lee1043 commented Aug 31, 2022

lee1043 commented Aug 17, 2022 •

edited

Loading

lee1043 commented Aug 17, 2022 •

edited

Loading

lee1043 commented Aug 18, 2022 •

edited

Loading