Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use _Unsigned attribute #133

Open
visr opened this issue Jun 27, 2021 · 2 comments
Open

use _Unsigned attribute #133

visr opened this issue Jun 27, 2021 · 2 comments

Comments

@visr
Copy link
Contributor

visr commented Jun 27, 2021

I read this StackOverflow question: https://stackoverflow.com/q/68135528, and through it I found out that there are apparently netCDFs out there with variables of type short, but if they have an attribute _Unsigned with value "true", then this data is supposed to be interpreted as unsigned short (which netCDF-4 also supports). I read some background in Unidata/netcdf4-python#656 and it seems this is a bit of a heritage from netCDF-3.

Since readers in other languages seem to support this, I guess perhaps we should too?

EDIT: see also my SO answer for an example file with some code.

@Alexander-Barth
Copy link
Owner

Thanks @visr , for helping the user on SO!
The _Unsigned attribute does not seem to be part of the CF convention.

Many links to the NetCDF best practise are broken, but I found it here:

  1. To be completely safe with unknown readers, widen the data type, or use floating point.
  2. You can use the corresponding signed types to store unsigned data only if all client programs know how to interpret this correctly.
  3. A new proposed convention is to create a variable attribute _Unsigned = "true" to indicate that integer data should be treated as unsigned.

I think that point 2 is also interesting. It could be read as that such files (using the corresponding signed types to store unsigned data) should not be used for public distribution were you do not control the client program.

Does somebody know if the work on the "new proposed convention" is still on-going? It also seems that this is specific to files written by old version of NetCDF-Java. Does somebody know since which version of NetCDF-Java use native unsigned types?

@visr
Copy link
Contributor Author

visr commented Jun 28, 2021

Indeed this is not a CF convention, only an old "proposed convention", which in practise still seems to be used (the example is new data), even though it shouldn't be. It seems like NetCDF-Java has had unsigned capabilities for a while, it's just users not taking advantage of this when updating their old data model.

The most commonly used clients seem to have implemented support for this. In a way it's quite unfortunate, since it leads to potentially misinterpreted data like in the SO post. I'm not sure how difficult it would be to add support for _Unsigned, potentially we can use reinterpret like in my SO answer, and avoid copying the data. If we decide to not support it, then perhaps we should throw an error when we encounter it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants