Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enum Types #143

Open
timhultberg opened this issue Sep 21, 2021 · 3 comments
Open

Enum Types #143

timhultberg opened this issue Sep 21, 2021 · 3 comments

Comments

@timhultberg
Copy link

I would like to use NCDatasets to create files using enumerated types. As far as I can see from the documentation and the code, this is not supported. Would it be possible to add it?

@Alexander-Barth
Copy link
Owner

You are right, enum types are currently not supported. It is certainly withing the scope and doable. It just takes a sometime to write and test the code. Here is some start to expose the low-level functions (f82c24a).

I am wondering what should be the return type of the higher level function. Maybe a julia array of Symbols, or a CategoricalArray/PooledArrays/IndirectArrays... I am not so familiar with these array types.

@timhultberg
Copy link
Author

Cool, thanks. for now I need to write rather than read the enum type, but this is still very helpfull.

"I am wondering what should be the return type of the higher level function. Maybe a julia array of Symbols, or a CategoricalArray/PooledArrays/IndirectArrays... I am not so familiar with these array types."
Not sure, have never used them, but I guess it should be possible to use Julias @enum types

@Alexander-Barth
Copy link
Owner

Alexander-Barth commented Sep 29, 2021

In NetCDF, an identifier (Clear in the example below) can appear in different enum types:

netcdf enum2 {
types:
  byte enum cloud_t {Clear = 0, Cumulonimbus = 1, Stratus = 2,
      Stratocumulus = 3, Cumulus = 4, Altostratus = 5, Nimbostratus = 6,
      Altocumulus = 7, Missing = 127} ;
  byte enum cloud2_t {Clear = 10, Cumulonimbus = 11} ;
dimensions:
        time = UNLIMITED ; // (5 currently)
variables:
        cloud_t primary_cloud(time) ;
                cloud_t primary_cloud:_FillValue = Missing ;
}

However, julia doesn't let me do that:

julia> @enum cloud_t Clear=0
julia> @enum cloud_t2 Clear=10
ERROR: invalid redefinition of constant Clear
Stacktrace:
 [1] top-level scope
   @ Enums.jl:198
 [2] top-level scope
   @ REPL[5]:1

Also julia keywords can be a problem:

@enum cloud_t3 end=10
ERROR: syntax: extra token "end" after end of expression
Stacktrace:
 [1] top-level scope
   @ none:1

While julias @enum seem to be natural (after all they have the same name than NetCDF enums ;-) ), I am not sure if this is the best (or save) choice here.

I just check with python's netCDF4, and they are simply returning the numbers:

In [2]: import netCDF4
In [3]: ds = netCDF4.Dataset("enum.nc")

In [6]: ds["primary_cloud"][:]
Out[6]:
masked_array(data=[0, 2, 4, --, 1],
             mask=[False, False, False,  True, False],
       fill_value=127,
            dtype=int8)

In [7]: data = ds["primary_cloud"][:]

In [9]: data[0]
Out[9]: 0

In [10]: data[1]
Out[10]: 2

The same is true for python's xarray.

(For your information I updated test_enum.jl)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants