Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bounds of derived variable are not read correctly #3678

Open
schlunma opened this issue Mar 5, 2020 · 9 comments
Open

Bounds of derived variable are not read correctly #3678

schlunma opened this issue Mar 5, 2020 · 9 comments

Comments

@schlunma
Copy link
Contributor

schlunma commented Mar 5, 2020

Hi guys,

in ESMValTool we found the following issue concerning files with derived variables (in particular atmosphere_hybrid_sigma_pressure_coordinate, see ESMValGroup/ESMValCore#543):

The bounds of the derived variable of model output that is consistent with the CF conventions cannot be read correctly with iris. Here is a minimal example using the nc file that is used on the CF convention webpage (without the optional attributes and the units of A replaced by 1):

netcdf a_new_file {
dimensions:
        eta = 1 ;
        lat = 1 ;
        lon = 1 ;
        bnds = 2 ;
variables:
        double eta(eta) ;
                eta:long_name = "eta at full levels" ;
                eta:positive = "down" ;
                eta:standard_name = "atmosphere_hybrid_sigma_pressure_coordinate" ;
                eta:formula_terms = "a: A b: B ps: PS p0: P0" ;
                eta:bounds = "eta_bnds" ;
        double eta_bnds(eta, bnds) ;
                eta_bnds:formula_terms = "a: A_bnds b: B_bnds ps: PS p0: P0" ;
        double A(eta) ;
                A:long_name = "a coefficient for vertical coordinate at full levels" ;
                A:units = "1" ;
        double A_bnds(eta, bnds) ;
        double B(eta) ;
                B:long_name = "b coefficient for vertical coordinate at full levels" ;
                B:units = "1" ;
        double B_bnds(eta, bnds) ;
        double PS(lat, lon) ;
                PS:units = "Pa" ;
        double P0 ;
                P0:units = "Pa" ;
        float temp(eta, lat, lon) ;
                temp:standard_name = "air_temperature" ;
                temp:units = "K" ;

Reading this file with iris

import iris
print(iris.__version__)
print("")

path = os.path.expanduser('~/a_new_file.nc')
cubes = iris.load(path)
print(cubes)
print("")

cube = cubes.extract_strict(iris.Constraint('air_temperature'))
air_pressure_coord = cube.coord('air_pressure')
print(air_pressure_coord)

gives

2.4.0

/miniconda3/envs/test/lib/python3.7/site-packages/iris/fileformats/cf.py:1074: UserWarning: Ignoring formula terms variable 'PS' referenced by data variable 'A_bnds' via variable 'eta': Dimensions ('lat', 'lon') do not span ('eta', 'bnds')
  warnings.warn(msg)
/miniconda3/envs/test/lib/python3.7/site-packages/iris/fileformats/cf.py:1074: UserWarning: Ignoring formula terms variable 'PS' referenced by data variable 'B_bnds' via variable 'eta': Dimensions ('lat', 'lon') do not span ('eta', 'bnds')
  warnings.warn(msg)
/miniconda3/envs/test/lib/python3.7/site-packages/iris/fileformats/netcdf.py:601: UserWarning: Unable to find coordinate for variable 'PS'
  '{!r}'.format(name))
/miniconda3/envs/test/lib/python3.7/site-packages/iris/fileformats/netcdf.py:601: UserWarning: Unable to find coordinate for variable 'PS'
  '{!r}'.format(name))
0: A_bnds / (1)                        (atmosphere_hybrid_sigma_pressure_coordinate: 1; -- : 2)
1: B_bnds / (1)                        (atmosphere_hybrid_sigma_pressure_coordinate: 1; -- : 2)
2: PS / (Pa)                           (-- : 1; -- : 1)
3: air_temperature / (K)               (atmosphere_hybrid_sigma_pressure_coordinate: 1; -- : 1; -- : 1)

AuxCoord(masked_array(data=[[[3004000.]]],
             mask=False,
       fill_value=1e+20), standard_name='air_pressure', units=Unit('Pa'))

As you can see, the air_pressure coordinate does not have bounds. When the optional attributes A:bounds = "A_bnds" and B:bounds = "B_bnds" are added to the file, iris is able to read the bounds correctly.

@pp-mo
Copy link
Member

pp-mo commented Aug 25, 2020

Just been asked to look at this.
But I'm afraid don't see how we can possibly fix this in Iris.

The original example in the CF conventions does include the bounds attributes. I believe they are only "optional" in the sense that the file isn't actually invalid (by CF rules) without them.

If I put the file into the NERC online CF checker , it gives ...

File name:      a_new_file.nc

Output of CF-Checker follows...

CHECKING NetCDF FILE: /tmp/13135.nc
=====================
WARN: Cannot determine CF version from the Conventions attribute; checking against latest CF version: CF-1.7
Using CF Checker Version 3.1.1
Checking against CF Version CF-1.7
Using Standard Name Table Version 74 (2020-08-04T14:43:55Z)
Using Area Type Table Version 10 (23 June 2020)
Using Standardized Region Name Table Version 4 (18 December 2018)

WARN: (2.6.1): No 'Conventions' attribute present

------------------
Checking variable: eta
------------------

------------------
Checking variable: eta_bnds
------------------
INFO: attribute formula_terms is being used in a non-standard way
ERROR: (4.3.3): formula_terms attribute only allowed on coordinate variables
ERROR: (4.3.3): Cannot get formula definition as no standard_name

------------------
Checking variable: A
------------------

------------------
Checking variable: A_bnds
------------------
WARN: (3): No standard_name or long_name attribute specified
INFO: (3.1): No units attribute set.  Please consider adding a units attribute for completeness.

------------------
Checking variable: B
------------------

------------------
Checking variable: B_bnds
------------------
WARN: (3): No standard_name or long_name attribute specified
INFO: (3.1): No units attribute set.  Please consider adding a units attribute for completeness.

------------------
Checking variable: PS
------------------
WARN: (3): No standard_name or long_name attribute specified

------------------
Checking variable: P0
------------------
WARN: (3): No standard_name or long_name attribute specified

------------------
Checking variable: temp
------------------

ERRORS detected: 2
WARNINGS given: 6
INFORMATION messages: 3

So, I read that as attempting to interpret all the "_bnds" variables as primary data variables, and then complaining that they have no standard_name or units.

Without the linking bounds attributes, I think we cannot safely associate "A_bnds" with "A" (i.e. automatically in code): The only way to do it is to rely on a naming convention which, apart from being a rather weak and fragile approach, is definitely not in the CF conventions. So we would really not be keen on adding that to Iris.

I must say you're not alone in encountering files like this, even in standard model output or even archive data. I was shown some files in the CMIP archive which do not pass the current CF checker -- though they probably did pass the earlier version that was in place when they were submitted. ( I think it may have been invalid units in that case ?).

@zklaus
Copy link
Contributor

zklaus commented Aug 31, 2020

Hi @pp-mo, thanks for taking a look at this!

Unfortunately, I am afraid it is a bit more than the usual data problems and really is the intended way of encoding things according to CF, not just leniency.

There are extensive discussions on the topic at the ESMValTool repo, the CMOR repo, the old CF trac and probably other places.

The long and short of it is that the formula term variables don't have bounds attributes because they are not auxiliary coordinates and thus the attribute would not have a standardized meaning.

Instead, the connection comes from the formula_terms attribute of the bounds variable of the parametric coordinate variable. Thus no reliance on naming conventions is necessary.

Does this make sense?

@pp-mo
Copy link
Member

pp-mo commented Sep 8, 2020

Thanks for explaining @schlunma
Apologies for long silence -- busy elsewhere !

Anyway, I just got a chance to look at this again + I can see you are 100% right 💐
So, this really is a missing feature + does need sorting out.
But unfortunately, in that case it is not a particularly quick thing to resolve, so I think won't make it into Iris 3.0.
( Iris 3 release is now pretty imminent : we are working on it the next week or 2 )

I've re-categorised this release "Iris 3,1" and label "CF 1.6/1.7" for now (though that label could do with a revamp).
I'm hoping we will soon find time to address a number of the outstanding CF issues -- that label hints at a list of them.

@pp-mo pp-mo removed this from Backlog in Iris v3.0.0 Sep 8, 2020
@pp-mo pp-mo modified the milestones: v3.0.0, v3.1.0 Sep 8, 2020
@bjlittle bjlittle modified the milestones: v3.1.0, v3.3.0 Nov 1, 2021
@bjlittle
Copy link
Member

@zklaus and @schlunma is this still a blocker for you guys?

Any sense of priority on this issue from your side?

@schlunma
Copy link
Contributor Author

schlunma commented Oct 4, 2022

Hi @bjlittle, this is still relevant for us, but only with a medium priority (we, the ESMValTool devs, assigned priorities to each issue with Feature: ESMValTool here: ESMValGroup/ESMValCore#1738).

At the moment we have a (lenghty) custom solution for this that does the trick, but it would be of course much nicer and cleaner if this would be handled by iris 🚀 Thanks!!

@pp-mo
Copy link
Member

pp-mo commented Oct 4, 2022

Hi @bjlittle, this is still relevant for us, but only with a medium priority ... it would be of course much nicer and cleaner if this would be handled by iris 🚀 Thanks!!

Thanks @schlunma.
I think we have this correctly flagged for importance, but it's rather tricky to do + we don't think it can make it into the forthcoming Iris 3.4, early November

@schlunma
Copy link
Contributor Author

schlunma commented Oct 4, 2022

No worries at all @pp-mo! Thanks for all your support on ESMValTool-related issues, we really appreciate it 👍

@stephenworsley
Copy link
Contributor

@zklaus , @schlunma how important would it be to preserve the form in which the bounds are attached to the variable when round tripping load/save?
It seems like a solution to this problem would involve Iris being able to recognise two ways to connect bounds to a variable:

  1. With a formula on the derived variable and bounds attached to each of the variables it derives from.
  2. With bounds attached to the derived variable and a formula on those bounds which refers to the bounds it derives from.

However, within Iris, there is just one way in which the bounds are attached (which structurally mirrors 1. at the moment). In order to solve this as simply as possible, I would expect that a cube loaded from a file of form 2 would save the same as a cube loaded from a file of form 2 currently does. It may be a seperate issue to make sure that this saved form aligns as closely to the best possible interpretation of CF, but would this be acceptable as a solution for the time being?

@schlunma
Copy link
Contributor Author

Hi @stephenworsley, thanks for working on this, much appreciated!

In my opinion, the priority for ESMValTool would be to be able to read files of form 2 without additional code. So yes, I think your proposed solution would definitely work for us! 👍

It would of course be nice if a CF-compliant file of form 2 actually saves in the same form, but this is only has a secondary priority for us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Status: No status
Status: 📋 Backlog
Development

No branches or pull requests

8 participants