-
Notifications
You must be signed in to change notification settings - Fork 180
NetCDF Java library fails intermittently when reading certain types of NCML aggregations #276
Comments
tar.gz contaning the aggregation referenced above and the three nc files that it aggregates. https://docs.google.com/uc?id=0B_bVl3gTeT9RS0QyRWFXUUIyVVE&export=download |
<aggregation type="joinNew" dimName="runtime">
<netcdf coordValue="0" location="ncom-relo-mayport_u_miw-t000.nc"/>
<netcdf coordValue="24">
<aggregation type="joinExisting" dimName="time">
<netcdf location="ncom-relo-mayport_26_u_miw-t001.nc"/>
<netcdf location="ncom-relo-mayport_26_u_miw-t000.nc"/>
</aggregation>
</netcdf>
ncom-relo-mayport_u_miw-t000.nc only has 1 time coordinate, but the inner aggregation has 2, so these are not homogeneous in the sense that Ncml aggregation requires.
could you explain more what you are trying to do?
|
The data I attached is for a test case in a scenario I am trying to handle. I have several thousand netcdfs (some CF, some not), most of which are the same logical dataset broken up via a time or Z axis into datasets consisting of 30-50 files, which I must aggregate into a single 'logical' dataset (I believe this is a fairly common use case). These files are updated daily, but due to the amount of data involved as well as other environmental factors, these updates happen sporadically over a span of about 24 hours. So what I am trying to do here is, as the files of an aggregated dataset are slowly updated with newer versions of the same file, add those new versions to the aggregated datasets that they belong to but ensuring that the new data can be differentiated within the aggregation via its data creation time (be it a model run time or production time or whatever). This is where the joining of files with the joinNew dimension comes in (in this example, 'runtime'), as the data creation time does not exist in the datasets as a coordinate variable, and in some cases is not even indicated in global attribution. Ultimately, once all of the files for an aggregated dataset have been updated, the aggregation contains files that all have the same data creation or run time, until the next update starts. You seem to be indicating that I cannot perform a 'joinNew' aggregation between datasets that have coordinate variables with different sizes? If that is the case, and I missed it in the documentation somewhere, then what about aggregating the files with a joinNew first, and then aggregating those aggregations as 'joinExisting' along time/Z axis? There still is the issue, though, of the random behavior (an exception for some reads, for other reads an array of values) which indicates a concurrency problem. If the read worked consistently, instead of only half of the time, that would still be useful to me as my code could easily determine which values in the returned array were valid. At any rate, thanks for responding so quickly. |
NetCDF Java library fails intermittently when reading certain types of NCML aggregations. As an example, consider the following NCML:
Given this aggregation, the java code below will fail ... sometimes ..
WHEN it fails, the following stack trace is produced:
If the code does not fail, the following output is produced:
Which, I believe, is incorrect as there should only be 3 available times based on the ncml above.
If the unit test above is modified to run the open-read process multiple times in rapid succession:
Then I am seeing a roughly 50% failure rate (stack trace) with the rest of the reads successful but apparently producing incorrect results. This problem is present in ToolsUI, and I have tried netcdf versions 4.5.5 - 4.6.x.
I will attach the data I am using to reproduce these results.
The text was updated successfully, but these errors were encountered: