Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IO_NC4_CHUNK_SIZE must allow 3 values for when dealing with cubes. #6632

Open
joa-quim opened this issue Apr 26, 2022 · 11 comments
Open

IO_NC4_CHUNK_SIZE must allow 3 values for when dealing with cubes. #6632

joa-quim opened this issue Apr 26, 2022 · 11 comments
Labels
bug Something isn't working

Comments

@joa-quim
Copy link
Member

gmt_nc has

	size_t nc4_chunksize[3] = {0, 0, 0};
	gmt_M_memcpy (&nc4_chunksize[cube], GMT->current.setting.io_nc4_chunksize, 2U, size_t);

but gmt_defaults.h sets only 2

size_t io_nc4_chunksize[2];         /* NetCDF chunk size (lat,lon) on output [0] */

and gmt_init.c only accounts for maximum two chunk values, but we may need 3 when dealing with cubes.

@joa-quim joa-quim added the bug Something isn't working label Apr 26, 2022
@PaulWessel
Copy link
Member

Basically, I am setting the 3-d dimension to chunk = 0. I think the order of the nc4_chunksize array is such that the first item is z, then y and x (or x and y) [hence that memcpy line bypassing the first element]. So if we allow three numbers in the default settings we need to understand what the order is first. Otherwise it is a simple fix to deal with.

@joa-quim
Copy link
Member Author

joa-quim commented Apr 27, 2022

The file that I sent you in a separate mail has dimensions [15][35][41] that corresponds to 15 layers, 35 rows and 41 columns. It was written by GDAL and my HDF viewer says the chunking is [1][1][41]. That file has 86100 bytes of single precision cube data (that is without headers). The file size is 139,264 bytes, so almost the double.

Same file written by GMT

grdinterpolate(U, G="U.nc", par=(IO_NC4_CHUNK_SIZE="35,41",));

has a size of 74,705 bytes and a chunking of [15][35][41].

This file came from a mat file that has 3 of these: U, V, W plus 3 others of the same dimensions with the x,y,z coordinates in the stupid meshgrid wasteful storage. But these 6 cubes make a file that in total have only 132,177 bytes. The chunking in this case is [15][13][35] but the cube's dimensions are [15][41][35].

So Matlab writes it by column ([15][41][35]) and GMT/GDAL do it by rows ([15][35][41]). I didn't know there was this liberty to write by rows or by columns in the netCDF/HDF world. Matlab also gets a much higher compression ratio, and even more considering that it's storing in double precision whilst I doing it in single.

@PaulWessel
Copy link
Member

It is not just chunking at work. There is IO_NC4_DEFLATION_LEVEL too - how do those differ?

@joa-quim
Copy link
Member Author

They do not. All level=3

@PaulWessel
Copy link
Member

As for letting IO_NC4_CHUNK_SIZE, since we have been doing y,x chunking syntax for years, should we do this:

If 2 values, then assume y,x
If 3 values, then assume z,y,x

Or should we instead say that if three then it is y,x[,z]?

@PaulWessel
Copy link
Member

The benefit of the latter is that we dont have to change the setting when moving from 2-D to 3-D entities.

@joa-quim
Copy link
Member Author

Not sure I understood the benefit. Saying that if 3 then z,y,x seems clear, but maybe to me only because I'm looking at the output of HDFexplorer that prints it that way.

@PaulWessel
Copy link
Member

Yes, I guess it is OK. If we find three it is z,y,x and if someone writes a grid we use y,x only. So should be safe and the docs can be updated to explain it is [z,]y,x

@PaulWessel
Copy link
Member

Update: Not sure if chunking in z will be possible. Right now, GMT writes the cube as a series of layers. Each layer have padding so we using the gmtnc_io_nc_grid function to write each layer to the cube file. Hard to see how z-chunking enters when pads need to be skipped. This would all have to be rewritten so that the memmove that undoes a layer padding would have to go through all layers so that the entire cube is moved upward in memory to a place with no pads, then written as a 3-D entity. I am not motivated to start that though. So even though I have a PR (soon) that allows you to set z-chuncking it will not have any effect - so is it worth even committing at this point?

@joa-quim
Copy link
Member Author

Ofc, in presence of the pad things can be horribly complicated and clearly doesn't worth the trouble. But are all cubes padded? I thought that I could send cubes via Julia without padding (but would need to confirm that).

@PaulWessel
Copy link
Member

We dont really have any thing that is truly 3-D in GMT so this is a kludge via stacks of grids, hence each grid has a pad, etc. have a look at how grdinterpolate operate or, better, what gmtapi_import_cube looks like and you will see.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants