Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading not possible with non-standard netcdf variable names #5171

Open
pp-mo opened this issue Feb 22, 2023 · 1 comment
Open

Loading not possible with non-standard netcdf variable names #5171

pp-mo opened this issue Feb 22, 2023 · 1 comment
Labels

Comments

@pp-mo
Copy link
Member

pp-mo commented Feb 22, 2023

A user presented some data from an online repository which has a lot of rather weird variable names (but is otherwise fairly sensible).
Probably from the PALM-4U atmospheric model

The question being : should we add some compliance to allow this?

Iris has been very strict on this since v2.3 -- see #3399, and refuses to load the file,
raising ValueError: 'theta(0)' is not a valid NetCDF variable name..
I think the main problem for Iris, which motivated being stricter in that change, is that we really wouldn't want to save data with these kinds of variable names
-- and maybe they could also cause other internal problems in Iris, like selecting cubes/coords by name ?

Resolution:

I personally think that is a poor reason for erroring it on load -- a tolerant adjusment would be "more ideal" IMHO.
A user comment was :

"I agree with the general principle that we shouldn’t be creating files containing such characters in variable names ... we are not always trying to load file which we have had control over the generation of ... Hence having no way of reading the file becomes quite a big restriction."

It seems a bit odd not to be able to load this, claiming that it is not "good netcdf", whereas there is nothing really wrong with the file : So, there's a difference between what netcdf say is generally valid, and what you can have in an actual HDF5-based netCDF4 file.
Needless to say, xarray has no problem with this data !
Adding the issue to the CF compliance discussion

Some Details...

Example file dump (shortened):

netcdf scalars_100_PALM_LES_IMUK_v2 {
dimensions:
	time = UNLIMITED ; // (1140 currently)
variables:
	double time(time) ;
		time:units = "seconds" ;
		time:long_name = "time" ;
		time:standard_name = "time" ;
		time:axis = "T" ;
	float E(time) ;
		E:units = "m2/s2" ;
		E:long_name = "E" ;
	float E\*(time) ;
		E\*:units = "m2/s2" ;
		E\*:long_name = "E*" ;
	float dt(time) ;
		dt:units = "s" ;
		dt:long_name = "dt" ;
	float us\*(time) ;
		us\*:units = "m/s" ;
		us\*:long_name = "us*" ;
	float th\*(time) ;
		th\*:units = "K" ;
		th\*:long_name = "th*" ;
  . . .

// global attributes:
		:title = "PALM 6.0  Rev: 4531  run: lanfex_iop1_stage1_low_15m.00  host: atosb_rrtmg  2021-03-30 00:08:02" ;
		:Conventions = "CF-1.7" ;
 . . .
}

List of variable names, :

>>> ds = nc.Dataset('sample_data/scalars_100_PALM_LES_IMUK_v2.nc')
>>> print(ds.variables.keys())
dict_keys(['time', 'E', 'E*', 'dt', 'us*', 'th*', 'umax', 'vmax', 'wmax', 'div_new', 'div_old', 'zi_wtheta', 'zi_theta', 'w*', 'w"theta"0', 'w"theta"', 'wtheta', 'theta(0)', 'theta(z_mo)', 'w"u"0', 'w"v"0', 'w"q"0', 'ol', 'q*', 'w"s"', 's*', 'ghf', 'qsws_liq', 'qsws_soil', 'qsws_veg', 'r_a', 'r_s', 'rad_net', 'rad_lw_in', 'rad_lw_out', 'rad_sw_in', 'rad_sw_out', 'rrtm_aldif', 'rrtm_aldir', 'rrtm_asdif', 'rrtm_asdir', 'time_s', 'lwdn', 'lwup', 'swdn', 'swup', 'tstar', 'shf', 'lhf', 'ustar', 'blh', 'zct', 'lwp', 'tca', 'cldsed', 'rainsed', 'tscrn', 'rhscrn', 'vis', 'u10m', 'v10m', 'zcb', 'blh2', 'smax', 'smax2m', 'lwp_min', 'lwp_max'])

and all the "invalid" ones

>>> from iris.common.metadata import _TOKEN_PARSE as tp 
>>> print([k for k in ds.variables.keys() if not tp.match(k)])
['E*', 'us*', 'th*', 'w*', 'w"theta"0', 'w"theta"', 'theta(0)', 'theta(z_mo)', 'w"u"0', 'w"v"0', 'w"q"0', 'q*', 'w"s"', 's*']
>>> 
@pp-mo pp-mo changed the title Consider supporting non-standard netcdf variable names Loading not possible with non-standard netcdf variable names Feb 22, 2023
Copy link
Contributor

github-actions bot commented Jul 7, 2024

In order to maintain a backlog of relevant issues, we automatically label them as stale after 500 days of inactivity.

If this issue is still important to you, then please comment on this issue and the stale label will be removed.

Otherwise this issue will be automatically closed in 28 days time.

@github-actions github-actions bot added the Stale A stale issue/pull-request label Jul 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: No status
Development

No branches or pull requests

1 participant