-
Notifications
You must be signed in to change notification settings - Fork 340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IO_NC4_CHUNK_SIZE must allow 3 values for when dealing with cubes. #6632
Comments
Basically, I am setting the 3-d dimension to chunk = 0. I think the order of the nc4_chunksize array is such that the first item is z, then y and x (or x and y) [hence that memcpy line bypassing the first element]. So if we allow three numbers in the default settings we need to understand what the order is first. Otherwise it is a simple fix to deal with. |
The file that I sent you in a separate mail has dimensions [15][35][41] that corresponds to 15 layers, 35 rows and 41 columns. It was written by GDAL and my HDF viewer says the chunking is [1][1][41]. That file has 86100 bytes of single precision cube data (that is without headers). The file size is 139,264 bytes, so almost the double. Same file written by GMT
has a size of 74,705 bytes and a chunking of [15][35][41]. This file came from a mat file that has 3 of these: U, V, W plus 3 others of the same dimensions with the x,y,z coordinates in the stupid meshgrid wasteful storage. But these 6 cubes make a file that in total have only 132,177 bytes. The chunking in this case is [15][13][35] but the cube's dimensions are [15][41][35]. So Matlab writes it by column ([15][41][35]) and GMT/GDAL do it by rows ([15][35][41]). I didn't know there was this liberty to write by rows or by columns in the netCDF/HDF world. Matlab also gets a much higher compression ratio, and even more considering that it's storing in double precision whilst I doing it in single. |
It is not just chunking at work. There is IO_NC4_DEFLATION_LEVEL too - how do those differ? |
They do not. All level=3 |
As for letting IO_NC4_CHUNK_SIZE, since we have been doing y,x chunking syntax for years, should we do this: If 2 values, then assume y,x Or should we instead say that if three then it is y,x[,z]? |
The benefit of the latter is that we dont have to change the setting when moving from 2-D to 3-D entities. |
Not sure I understood the benefit. Saying that if 3 then z,y,x seems clear, but maybe to me only because I'm looking at the output of HDFexplorer that prints it that way. |
Yes, I guess it is OK. If we find three it is z,y,x and if someone writes a grid we use y,x only. So should be safe and the docs can be updated to explain it is [z,]y,x |
Update: Not sure if chunking in z will be possible. Right now, GMT writes the cube as a series of layers. Each layer have padding so we using the gmtnc_io_nc_grid function to write each layer to the cube file. Hard to see how z-chunking enters when pads need to be skipped. This would all have to be rewritten so that the memmove that undoes a layer padding would have to go through all layers so that the entire cube is moved upward in memory to a place with no pads, then written as a 3-D entity. I am not motivated to start that though. So even though I have a PR (soon) that allows you to set z-chuncking it will not have any effect - so is it worth even committing at this point? |
Ofc, in presence of the pad things can be horribly complicated and clearly doesn't worth the trouble. But are all cubes padded? I thought that I could send cubes via Julia without padding (but would need to confirm that). |
We dont really have any thing that is truly 3-D in GMT so this is a kludge via stacks of grids, hence each grid has a pad, etc. have a look at how grdinterpolate operate or, better, what gmtapi_import_cube looks like and you will see. |
gmt_nc
hasbut
gmt_defaults.h
sets only 2and
gmt_init.c
only accounts for maximum two chunk values, but we may need 3 when dealing with cubes.The text was updated successfully, but these errors were encountered: