Skip to content
This repository has been archived by the owner on Sep 1, 2022. It is now read-only.

Odd new variable appearing in this joinExisting aggregation #451

Open
rsignell-usgs opened this issue Feb 25, 2016 · 19 comments
Open

Odd new variable appearing in this joinExisting aggregation #451

rsignell-usgs opened this issue Feb 25, 2016 · 19 comments

Comments

@rsignell-usgs
Copy link
Contributor

rsignell-usgs commented Feb 25, 2016

We have a bunch of netcdf granules here:
http:https://geoport-dev.whoi.edu/thredds/catalog/usgs/data2/rsignell/gdrive/nsf-alpha/Data/WHOI-HFRadar-Data-Sets/catalog.html

that we are aggregating with a very simple NcML that joins along the time dimension t:

<?xml version="1.0" encoding="UTF-8"?>
<netcdf xmlns="http:https://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2" >
  <aggregation dimName="t" type="joinExisting">
    <scan location="." regExp="^WHOI_ISLE_HFR_[0-9]{4}_[0-9]{2}_[0-9]{2}_800mgrid_1000mrad_20-Feb-2016\.nc$"/>
  </aggregation>
</netcdf>

The resulting aggregation dataset here:
http:https://geoport-dev.whoi.edu/thredds/dodsC/usgs/data2/rsignell/gdrive/nsf-alpha/Data/WHOI-HFRadar-Data-Sets/00_dir_HFR_agg.ncml.html
seems to work fine, but we noticed that the aggregation has acquired an odd new variable t that didn't exist before.

This new variable t has some rather strange values:
http:https://geoport-dev.whoi.edu/thredds/dodsC/usgs/data2/rsignell/gdrive/nsf-alpha/Data/WHOI-HFRadar-Data-Sets/00_dir_HFR_agg.ncml.ascii?t[0:1:47]

Is this because the time coordinate variable datetime has a different name than the time dimension t?

Is this expected behavior?

@rsignell-usgs
Copy link
Contributor Author

Yikes, this is even stranger. There are two different dimensions, both called t:

http:https://geoport-dev.whoi.edu/thredds/dodsC/usgs/data2/rsignell/gdrive/nsf-alpha/Data/WHOI-HFRadar-Data-Sets/00_dir_HFR_agg.ncml.dds

gives

Dataset {
    Float64 Longitude[lon = 39];
    Float64 Latitude[lat = 36];
    Int32 t[t = 48];
    Float64 datetime[t = 4416];
    Float64 East_vel[t = 4416][lat = 36][lon = 39];
    Float64 North_vel[t = 4416][lat = 36][lon = 39];
    Float64 East_err[t = 4416][lat = 36][lon = 39];
    Float64 North_err[t = 4416][lat = 36][lon = 39];
    Float64 err_cov[t = 4416][lat = 36][lon = 39];
    Float64 total_err[t = 4416][lat = 36][lon = 39];
} usgs/data2/rsignell/gdrive/nsf-alpha/Data/WHOI-HFRadar-Data-Sets/00_dir_HFR_agg.ncml;

@rsignell-usgs
Copy link
Contributor Author

If I try renaming the dimension

<?xml version="1.0" encoding="UTF-8"?>
<netcdf xmlns="http:https://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2" >
 <dimension name="datetime" orgName="t"/>
  <aggregation dimName="t" type="joinExisting">
    <scan location="." regExp="^WHOI_ISLE_HFR_[0-9]{4}_[0-9]{2}_[0-9]{2}_800mgrid_1000mrad_20-Feb-2016\.nc$"/>
  </aggregation>
</netcdf>

then the DDS looks better, but still I have that strange t variable with it's own t dimension:
http:https://geoport-dev.whoi.edu/thredds/dodsC/usgs/data2/rsignell/gdrive/nsf-alpha/Data/WHOI-HFRadar-Data-Sets/00_dir_HFR_agg2.ncml.dds

Dataset {
    Float64 Longitude[lon = 39];
    Float64 Latitude[lat = 36];
    Int32 t[t = 48];
    Float64 datetime[datetime = 4416];
    Float64 East_vel[datetime = 4416][lat = 36][lon = 39];
    Float64 North_vel[datetime = 4416][lat = 36][lon = 39];
    Float64 East_err[datetime = 4416][lat = 36][lon = 39];
    Float64 North_err[datetime = 4416][lat = 36][lon = 39];
    Float64 err_cov[datetime = 4416][lat = 36][lon = 39];
    Float64 total_err[datetime = 4416][lat = 36][lon = 39];
} usgs/data2/rsignell/gdrive/nsf-alpha/Data/WHOI-HFRadar-Data-Sets/00_dir_HFR_agg2.ncml;

@rsignell-usgs
Copy link
Contributor Author

@cwardgar , should I send e-mail to thredds support referencing this ticket? Not sure of the protocol anymore...

@dopplershift
Copy link
Member

No. Check your files. I just dumped them all via opendap and one actually HAS a variable called 't'. (I'll get filename in a second.)

@dopplershift
Copy link
Member

Might have spoken too soon... (stupid ncml files also get opened by opendap...)

@lesserwhirls
Copy link
Collaborator

@rsignell-usgs - I think github works best for potential bugs like this. Can you try renaming the dimension inside the aggregation? That worked for me using a few of the files from the server.

@lesserwhirls
Copy link
Collaborator

According to the ncml agg docs:

https://www.unidata.ucar.edu/software/thredds/v4.6/netcdf-java/ncml/Aggregation.html

"Variables of the same name (in different files) are connected along their existing, outer dimension, called the aggregation dimension. A coordinate variable must exist for the dimension."

So, in the example you have above renaming the dimension, the coordinate variable t is being created for each file, and then you rename the dimension overall. If you rename the dimension inside the aggregation, the the variable datetime is recognized as the coordinate variable and no new variable t is created.

<?xml version="1.0" encoding="UTF-8"?>
<netcdf xmlns="http:https://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2" >
  <aggregation dimName="datetime" type="joinExisting">
    <scan location="." regExp="^WHOI_ISLE_HFR_[0-9]{4}_[0-9]{2}_[0-9]{2}_800mgrid_1000mrad_20-Feb-2016\.nc$"/>
    <dimension name="datetime" orgName="t" />
  </aggregation>
</netcdf>

@lesserwhirls
Copy link
Collaborator

Now here is a fun one...if I tell the joinExisting to use dimName="datetime" instead of dimName="t" and change nothing else, like so:

<?xml version="1.0" encoding="UTF-8"?>
<netcdf xmlns="http:https://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2" >
  <aggregation dimName="datetime" type="joinExisting">
    <scan location="." regExp="^WHOI_ISLE_HFR_[0-9]{4}_[0-9]{2}_[0-9]{2}_800mgrid_1000mrad_20-Feb-2016\.nc$"/>
  </aggregation>
</netcdf>

then things work as well. Since the dimension datetime does not exists but the variable does, the ncml agg creates the new dimension...I don't think it should be doing that!

@lesserwhirls
Copy link
Collaborator

In short, I think this is a bug.

Here is what I think might be going on: even though the variable datetime is a coordinate variable, the NCML aggregation code does not pick up the variable datetime as the coordinate variable corresponding to the dimension t, and as such and creates a new variable t to match the name of the dimension t.

@JohnLCaron - any of this ringing a bell, or brining back memories of NcML aggregation nightmares?

@JohnLCaron
Copy link
Collaborator

get rid of
its screwing things up.

doesnt need to have same name,

:coordinates = "Longitude Latitude datetime";

works fine

@JohnLCaron
Copy link
Collaborator

not sure what this "variable t that didn't exist before" is yet.
so i may be wrong, we may be assuming existence or coordinate variable.

@JohnLCaron
Copy link
Collaborator

if so, try

<variable name="t" orgName="datetime" />

not

<dimension name="datetime" orgName="t" />

@rsignell-usgs
Copy link
Contributor Author

@lesserwhirls , awesome! I didn't know I could rename the dimension inside the aggregation tag! And I agree that creating a time coordinate variable with the same name as the dimension is a bug, since one already exists (it just isn't named the same as the dimension).

@rsignell-usgs
Copy link
Contributor Author

@rsignell-usgs
Copy link
Contributor Author

@lesserwhirls should we leave this open until the bug is fixed or do you want to introduce another issue that actually more closely addresses the issue?

@lesserwhirls
Copy link
Collaborator

I think we should just leave this open, and I will try to summarize things. However, it looks like @JohnLCaron had something slightly different in mind (rather than renaming the dimension), but I'm not sure if there is a difference between renaming the dimension or renaming the variable.

So @JohnLCaron, here is what I understand the situation is:

Each netCDF file has a dimension t and an associated coordinate variable datetime, which is correctly picked up by the CoordSys tab in ToolsUI as a coordinate variable. When you do a joinExisting NcML agg, the aggregation creates a new variable t, with what appears to be a default value set for all values in the array. I assume this is done to match the dimension t, even though the (dimension <---> coordinate variable) pair is t and datetime. Note that the docs for the joinExisting NcML agg state that we assume a coordinate variable for the joinExisting dimension exists.

I'm thinking that the NcML agg does not pick up on the fact that the (dimension <---> coordinate variable ) pair is t and datetime, and thus it does not need to create a new variable t. To me, this indicates a bug in that the joinExisting agg is actually requiring that a variable with the same name as the join dimension exits, rather than a corresponding coordinate variable exists for the join dimension (as stated in the docs). If we rename the dimension t to datetime, or rename the variable datetime to t, things work as expected.

@rsignell-usgs
Copy link
Contributor Author

@lesserwhirls this is exactly how I understand the situation as well. 😸

@JohnLCaron
Copy link
Collaborator

agree

On Fri, Feb 26, 2016 at 9:35 AM, Rich Signell [email protected]
wrote:

@lesserwhirls https://github.com/lesserwhirls this is exactly how I
understand the situation as well. [image: 😸]


Reply to this email directly or view it on GitHub
#451 (comment).

@lesserwhirls lesserwhirls changed the title Odd new variable appearing in this joinExisting aggregation Odd new variable appearing in this joinExisting aggregation Sep 8, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants