Skip to content
This repository has been archived by the owner on Sep 1, 2022. It is now read-only.

NetCDF-java enum data type problems #901

Open
joaquinrgu opened this issue Jul 28, 2017 · 12 comments
Open

NetCDF-java enum data type problems #901

joaquinrgu opened this issue Jul 28, 2017 · 12 comments

Comments

@joaquinrgu
Copy link

joaquinrgu commented Jul 28, 2017

The netCDF-java library does not properly write netCDF4 variables of data type enum from an NcML template. The following java code is used to illustrate the problem:

   public static void main( String[ ] args ) throws Exception
   {
      NetcdfDataset ncfileIn = NcMLReader.readNcML ("file:C:\\NetCDF\\test.ncml", null);
      FileWriter2 writer = new ucar.nc2.FileWriter2(ncfileIn, "test.nc", NetcdfFileWriter.Version.netcdf4, null);
      NetcdfFile ncfileOut = writer.write(null);
      ncfileOut.close();
      ncfileIn.close(); 
   }

Test.ncml only contains a variable, with data type enum:

<?xml version="1.0" encoding="UTF-8"?>
<ncml:netcdf xmlns:ncml="http:https://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2" >
  <ncml:enumTypedef name="boolean" type="enum1">
    <ncml:enum key="0">false</ncml:enum>
    <ncml:enum key="1">true</ncml:enum>
  </ncml:enumTypedef>
  <ncml:dimension name="len" length="4" />
  <ncml:variable name="timeliness_non_nominal" shape="len" type="enum1" typedef="boolean">
    <ncml:attribute name="long_name" value="Timeliness non-nominal warning flag" />
  </ncml:variable>
</ncml:netcdf>

The netCDF4 dataset is succesfully generated, however the variable type is incorrect, according to :

A) NetCDF-java NCdumpW: no enumTypeDef shown.
!$ java -Xmx1g -classpath toolsUI-4.6.jar ucar.nc2.NCdumpW test.nc -ncml

<?xml version='1.0' encoding='UTF-8'?>
<netcdf xmlns='http:https://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2'
    location='file:test.nc' >
  <dimension name='len' length='4' />
  <variable name='timeliness_non_nominal' type='enum1' shape='len' >
    <attribute name='long_name' value='Timeliness non-nominal warning flag' />
  </variable>
</netcdf>

B) ToolsUI (NcML tab): enumTypeDef boolean not referenced in variable

<?xml  version="1.0" encoding="UTF-8"?>
<ncml:netcdf xmlns:ncml="http:https://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2" location="H:/Development/workspace/NetCDFGenerator/test.nc">
  <ncml:enumTypedef name="boolean" type="enum1">
    <ncml:enum key="0">false</ncml:enum>
    <ncml:enum key="1">true</ncml:enum>
  </ncml:enumTypedef>
  <ncml:dimension name="len" length="4" />
  <ncml:variable name="timeliness_non_nominal" shape="len" type="enum1" typedef="timeliness_non_nominal">
    <ncml:attribute name="long_name" value="Timeliness non-nominal warning flag" />
  </ncml:variable>
</ncml:netcdf>

THREDDS also fails to provide OPENDAP access to the file, returning the following error:

Error {
    code = 403;
    message = "NcDDS Variable data type = enum1";
};

cwardgar edit: formatting

@joaquinrgu
Copy link
Author

joaquinrgu commented Jul 28, 2017

The error might be in the netCDF-java reader, and not in the writer, as netCDF-C's ncdump returns the exepcted output:

!$ ncdump -h test.nc

netcdf test {
types:
  byte enum boolean {false = 0, true = 1} ;
dimensions:
        len = 4 ;
variables:
        boolean timeliness_non_nominal(len) ;
                timeliness_non_nominal:long_name = "Timeliness non-nominal warning flag" ;
}

While netCDF-java's ncdump returns the following CDL (notice the data type of the variable is the same as the variable name):

java -Xmx1g -classpath toolsUI-4.6.jar ucar.nc2.NCdumpW test.nc

 test.nc {
  types:
    byte enum boolean { 'false' = 0, 'true' = 1};
  dimensions:
    len = 4;
  variables:
    enum timeliness_non_nominal timeliness_non_nominal(len=4);
      :long_name = "Timeliness non-nominal warning flag";
}

Edit cwardgar: formatting

@DennisHeimbigner
Copy link
Contributor

Which version of netcdf-java are you using?

@cwardgar
Copy link
Contributor

This doesn't appear to be a problem with NetCDF-Java v5.0.0. Can you try using that?

http:https://artifacts.unidata.ucar.edu/content/repositories/unidata-snapshots/edu/ucar/toolsUI/5.0.0-SNAPSHOT/

@DennisHeimbigner
Copy link
Contributor

I made some changes in 5.0 re enums (and I think a couple of other things) in 5.0,
but I am pretty sure I did not back port them to 4.x

@joaquinrgu
Copy link
Author

joaquinrgu commented Jul 31, 2017

Unfortunately it keeps happening to me with the latest ToolsUI (toolsUI-5.0.0-20170721.123717-202.jar):
!$ java -Xmx1g -classpath toolsUI-5.0.0-20170721.123717-202.jar ucar.nc2.NCdumpW test.nc -ncml

<netcdf xmlns="http:https://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2" location="file:test.nc">
  <enumTypedef name="boolean" type="enum1">
    <enum key="0">false</enum>
    <enum key="1">true</enum>
  </enumTypedef>
  <dimension name="len" length="4" />
  <variable name="timeliness_non_nominal" shape="len" type="enum1" typedef="timeliness_non_nominal">
    <attribute name="long_name" value="Timeliness non-nominal warning flag" />
  </variable>
</netcdf>

I used netCDF-java-4.6.6.jar with netCDF-C 4.4.0 for writting the file (in Windows) and the latest THREDDS (4.6.9) and ToolsUI for reading. As explained, ncdump from netCDF-C 4.2.1.1 does read the dataset correctly, so the writing seems to be ok.

I have uploaded the netCDF dataset to an FTP (ftp:https://ftp.eumetsat.int/pub/EUM/out/USC/JRG/test.nc). It will be available for 30 days. Please let me know if I can do any other test. Thanks!

@DennisHeimbigner
Copy link
Contributor

This is all beginning to ring some bells. For a long time, there was a problem
in Nc4Iosp in that it did not properly handle enum names: much like what is being
seen here. The actual problem was the way that the netcdf metadata was converted
internally to "equivalent" CDM metadata.

@cwardgar
Copy link
Contributor

cwardgar commented Aug 2, 2017

Okay, I was able to reproduce the error. The problem seems to be buried in H5header somewhere. As far as I can tell, the timeliness_non_nominal DataObject knows it has an enum dataType, but it has no actual reference to the boolean DataObject. So, a default name is assigned to timeliness_non_nominal's type: the name of the variable itself.

I assume that the association between timeliness_non_nominal and boolean IS present in the HDF5 file, but H5header is missing it. Also, I do not know if the enum feature is otherwise functioning correctly and just the name is wrong.

@joaquinrgu As you mentioned, NetCDF-C does not suffer this problem. It may interest you to know that NetCDF-Java can be configured to use libnetcdf (if installed) to read datasets, instead of its own classes. Maybe you could use that as a workaround until we get this fixed.

See this page about runtime configuration. As an example, I have the following config at ~/.unidata/nj22Config.xml, and it allows me to use libnetcdf for reading in ToolsUI:

<nj22Config>
  <Netcdf4Clibrary>
     <libraryPath>/opt/netcdf-4.4.1/lib</libraryPath>
     <libraryName>netcdf</libraryName>
     <useForReading>true</useForReading>
  </Netcdf4Clibrary>
</nj22Config>

@cwardgar
Copy link
Contributor

cwardgar commented Aug 2, 2017

Also, see this page for general instructions on how to load libnetcdf in NetCDF-Java.

@joaquinrgu
Copy link
Author

I didn't know about this option! thanks a lot, I will test it.

@DennisHeimbigner
Copy link
Contributor

DennisHeimbigner commented Aug 2, 2017

I assume that the association between timeliness_non_nominal and boolean IS
present in the HDF5 file, but H5header is missing it.

One way to see what is happening is to do the h5dump command on the .nc file.
You will notice that the enum is duplicated. One free standing and once associated
with the variable itself. This is because HDF5 has a limited notion of type declaration.
The netcdf code recognizes this and internally matches the duplicate enum declarations.
Since the HDF5 Iosp is technically reading an HDF5 file, and supposedly knows nothing
about netcdf-4, it can create duplicate enum types in the translated CDM and the variable
will point to the one created from the enum declaration associated with the variable.
Hence what you see.
Bottom line: technically H5Iosp/H5header read HDF5 files, not netcdf-4 files. There are some
concessions in the HDF5 code, but it can lose netcdf-4 related information. You would
also see this for compound types, but CDM does not have a separately declared compound
type (one of the many places where CDM and netcdf-4 models differ).
gets lost.

@DennisHeimbigner
Copy link
Contributor

I have to amend my comment. The appearance of e.g. an enum twice may be an artifact of theh5dump program. It is unclear if inside the hdf5 file, the enum is actually duplicated.

@DennisHeimbigner
Copy link
Contributor

I retract my retraction. Netcdf files do duplicate the type definitions. I hypothesize that this
is historical from when HDF5 did not support named types (long long ago). At some point
this changed so types became 1st class objects. We could make use of this in a back compatible
fashion except that I suspect that it might break the Java HDF5 reader.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants