Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregate crashes when fields contain the actual_range attribute #764

Closed
Mark-D-Smith opened this issue Apr 22, 2024 · 6 comments · Fixed by #765
Closed

Aggregate crashes when fields contain the actual_range attribute #764

Mark-D-Smith opened this issue Apr 22, 2024 · 6 comments · Fixed by #765
Labels
aggregation Rerlating to metadata-based field and domain aggregation bug Something isn't working
Milestone

Comments

@Mark-D-Smith
Copy link

I have been trying to load some monthly files containing air temperature data (from chess-met https://catalogue.ceh.ac.uk/documents/835a50df-e74f-4bfb-b593-804fd61d5eab; although the data themselves are not important for this error) using cf.load, with the intention of aggregating along the time axis. I get a crash from cf.aggregate when the files contain the actual_range attribute AND there are at least three files to aggregate.

This occurs because unlike some other attributes actual_range is not handled specially by cf.aggregate. The default behviour is to convert both to strings and append one to the other:

cf-python/cf/aggregate.py

Lines 4871 to 4881 in 7d78eac

if concatenate:
if value1 is not None:
if value0 is not None:
parent0.set_property(
prop, f"{value0} :AGGREGATED: {value1}"
)
else:
parent0.set_property(prop, f" :AGGREGATED: {value1}")
else:
if value0 is not None:
parent0.del_property(prop)

When the third file is appended, the string comparison is performed between a two-element np.ndarray from the new file and a string from the previous aggregation

cf-python/cf/aggregate.py

Lines 4864 to 4869 in 7d78eac

# Still here?
if isinstance(value0, str) or isinstance(value1, str):
if value0 == value1:
continue
elif parent0._equals(value0, value1):
continue
. This raises an exception as an if statement requires a scalar logical (i.e. any or all).

Since the valid_range attribute is defined in the CF-standard (https://cfconventions.org/Data/cf-conventions/cf-conventions-1.7/build/ch02s05.html.) this should be handled. A simple correction would be to check if the property being updated is actual_range, and if so to set the parent's corresponding attribute to the minima and maxima of the two aggregated datasets. For instance:

        if prop in ("actual_range"):
            parent0.set_property(prop,[min(value0[0],value1[0]),max(value0[1],value1[1])])
            continue 

Alternatively it could be fixed by ignoring this attribute, as is currently done for valid_range & similar attributes.

@Mark-D-Smith Mark-D-Smith added the bug Something isn't working label Apr 22, 2024
@Mark-D-Smith
Copy link
Author

The result of cf.environment:

Platform: Linux-3.10.0-1160.114.2.el7.x86_64-x86_64-with-glibc2.17
HDF5 library: 1.14.3
netcdf library: 4.9.2
udunits2 library: /home/users/marrho/miniconda3/envs/py3_unifhy/lib/libudunits2.so.0
esmpy/ESMF: 8.6.0
Python: 3.12.2
dask: 2024.4.0
netCDF4: 1.6.5
psutil: 5.9.8
packaging: 24.0
numpy: 1.26.4
scipy: 1.12.0
matplotlib: not available
cftime: 1.6.2
cfunits: 3.3.6
cfplot: not available
cfdm: 1.11.1.0
cf: 3.16.1

@davidhassell
Copy link
Collaborator

Hi Mark,

Thanks for the report. I'm having a look, but it'd be useful to know if you provided any keywords to configure the aggregation process (such as respect_valid, or any others)?

When you say "actual_range", do you perhaps mean "valid_range", or is there something else going on?

I'll run some tests ...

@davidhassell
Copy link
Collaborator

Hi Mark,

I agree that

cf-python/cf/aggregate.py

Lines 4864 to 4869 in 7d78eac

# Still here?
if isinstance(value0, str) or isinstance(value1, str):
if value0 == value1:
continue
elif parent0._equals(value0, value1):
continue
needs fixing up for the case that the property values are numpy arrays - thanks for spotting this.

I see that you did mean actual_range (sorry - I (mis)read a little hastily!). Your suggestion sounds reasonable, and I'll put together a PR.

@Mark-D-Smith
Copy link
Author

Thanks David, just in case its still useful - I didn't set any other keywords, so default behaviour was to ignore the valid_range, valid_min and valid_max keywords. Its just actual range that threw the error, but as you rightly observe this could happen for any hypothetical keyword that is a numpy array so a more general fix might be preferable.

@davidhassell davidhassell added the aggregation Rerlating to metadata-based field and domain aggregation label Apr 23, 2024
@davidhassell davidhassell added this to the 3.16.2 milestone Apr 23, 2024
@davidhassell
Copy link
Collaborator

Hi Mark - PR #765 created. With a following wind it should make the public release we're planning for Thursday/Friday this week.

@davidhassell
Copy link
Collaborator

Hi Mark,

This change will go into v3.16.2, which will be publicly released tomorrow (2024-04-26) - many thanks for your input.

David

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aggregation Rerlating to metadata-based field and domain aggregation bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants