-
Notifications
You must be signed in to change notification settings - Fork 179
Sorting for time values before aggregation #906
Comments
Dear @kthyng, How are you aggregating the files? what is your NcML file? regards |
Roping in @skbaum since I'm a user but he set it up! |
The filenames are of the form: roms_his_201611.nc and the NcML is: <netcdf xmlns="http:https://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
<aggregation dimName="ocean_time" type="joinExisting" recheckEvery="6 hour">
<scan location="/atch/raid2/dj/oof_latest/oof/oof/outputs/ncfiles/archives/" regExp="roms_his.*\.nc"/>
</aggregation>
</netcdf> Upon reading the aggregation page at: https://www.unidata.ucar.edu/software/thredds/current/netcdf-java/ncml/Aggregation.html I find the following: "By default, the files are ordered by sorting on the filename." This makes me think that what happened shouldn't have happened, and that the time stamps shouldn't have had to be modified. Perhaps it's a subtle bug. I also realize that the issue can be forced by specifying each h filename within the NcML, but that would require editing the catalog.xml file every time a file is added. Steve |
I wonder if this is an issue with the use of the |
Do you mean listing out the files? If so, that would work, but then it would have to be continually updated since this is an operational system that updates in time. So, that would not be ideal. |
For example, would it be possible to do this: <netcdf xmlns="http:https://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
<aggregation dimName="ocean_time" type="joinExisting" recheckEvery="6 hour">
<scan location="/atch/raid2/dj/oof_latest/oof/oof/outputs/ncfiles/archives/" suffix=".nc" />
</aggregation>
</netcdf> That is, is the |
Oh I see. Yes, there are other *.nc files in the directory. |
Ah, ok. Another question - how many time steps are in each file? |
Every hour for the month, so about 30*24=720 depending on how long the month is. |
So having dug into some of our aggregation code, I can see that touching the files on disk caused a rescan of the collection (the code looks at the last modified time on disk to determine if a file was changed), which is probably why it caused things to work. But, the code is pretty complicated under the hood, unfortunately. Just so I can understand a bit better here, it looks like you store data in daily netCDF files, and those files are rechecked every 6 hours.
Sorry for all the questions. The code that the standard java runtime library checks last modified time is OS dependent and I've seen reports where certain combinations end up returning the wrong last modified date. Also, depending on how files are being updated (if they are being updated throughout the day), the last modified time may not actually be updated (for example, if the file is held in an open state as data are added). |
Thanks for the detailed response. I'll ping @skbaum again for help on this. |
Hi all. I just figured out something that had been plaguing me for a week. We have ROMS model output files aggregated by thredds here, for example: http:https://barataria.tamu.edu:8080/thredds/dodsC/NcML/oof_archive_agg.
The output was coming out all jumbled and weird with some time indices working and others not working. It turns out that the model output files had time stamps that were out of order. "Touch"ing each file in the correct chronological order fixed the problem.
So my question is: would it be possible to have a "sort" step over the time dimension before the aggregation step?
Thanks.
The text was updated successfully, but these errors were encountered: