IO_NC4_CHUNK_SIZE must allow 3 values for when dealing with cubes. #6632

joa-quim · 2022-04-26T22:41:49Z

gmt_nc has

	size_t nc4_chunksize[3] = {0, 0, 0};
	gmt_M_memcpy (&nc4_chunksize[cube], GMT->current.setting.io_nc4_chunksize, 2U, size_t);

but gmt_defaults.h sets only 2

size_t io_nc4_chunksize[2];         /* NetCDF chunk size (lat,lon) on output [0] */

and gmt_init.c only accounts for maximum two chunk values, but we may need 3 when dealing with cubes.

The text was updated successfully, but these errors were encountered:

PaulWessel · 2022-04-27T07:46:39Z

Basically, I am setting the 3-d dimension to chunk = 0. I think the order of the nc4_chunksize array is such that the first item is z, then y and x (or x and y) [hence that memcpy line bypassing the first element]. So if we allow three numbers in the default settings we need to understand what the order is first. Otherwise it is a simple fix to deal with.

joa-quim · 2022-04-27T13:11:08Z

The file that I sent you in a separate mail has dimensions [15][35][41] that corresponds to 15 layers, 35 rows and 41 columns. It was written by GDAL and my HDF viewer says the chunking is [1][1][41]. That file has 86100 bytes of single precision cube data (that is without headers). The file size is 139,264 bytes, so almost the double.

Same file written by GMT

grdinterpolate(U, G="U.nc", par=(IO_NC4_CHUNK_SIZE="35,41",));

has a size of 74,705 bytes and a chunking of [15][35][41].

This file came from a mat file that has 3 of these: U, V, W plus 3 others of the same dimensions with the x,y,z coordinates in the stupid meshgrid wasteful storage. But these 6 cubes make a file that in total have only 132,177 bytes. The chunking in this case is [15][13][35] but the cube's dimensions are [15][41][35].

So Matlab writes it by column ([15][41][35]) and GMT/GDAL do it by rows ([15][35][41]). I didn't know there was this liberty to write by rows or by columns in the netCDF/HDF world. Matlab also gets a much higher compression ratio, and even more considering that it's storing in double precision whilst I doing it in single.

PaulWessel · 2022-04-27T15:32:19Z

It is not just chunking at work. There is IO_NC4_DEFLATION_LEVEL too - how do those differ?

joa-quim · 2022-04-27T15:38:17Z

They do not. All level=3

PaulWessel · 2022-04-27T15:38:27Z

As for letting IO_NC4_CHUNK_SIZE, since we have been doing y,x chunking syntax for years, should we do this:

If 2 values, then assume y,x
If 3 values, then assume z,y,x

Or should we instead say that if three then it is y,x[,z]?

PaulWessel · 2022-04-27T15:39:07Z

The benefit of the latter is that we dont have to change the setting when moving from 2-D to 3-D entities.

joa-quim · 2022-04-27T15:45:14Z

Not sure I understood the benefit. Saying that if 3 then z,y,x seems clear, but maybe to me only because I'm looking at the output of HDFexplorer that prints it that way.

PaulWessel · 2022-04-27T15:48:49Z

Yes, I guess it is OK. If we find three it is z,y,x and if someone writes a grid we use y,x only. So should be safe and the docs can be updated to explain it is [z,]y,x

PaulWessel · 2022-04-28T08:43:11Z

Update: Not sure if chunking in z will be possible. Right now, GMT writes the cube as a series of layers. Each layer have padding so we using the gmtnc_io_nc_grid function to write each layer to the cube file. Hard to see how z-chunking enters when pads need to be skipped. This would all have to be rewritten so that the memmove that undoes a layer padding would have to go through all layers so that the entire cube is moved upward in memory to a place with no pads, then written as a 3-D entity. I am not motivated to start that though. So even though I have a PR (soon) that allows you to set z-chuncking it will not have any effect - so is it worth even committing at this point?

joa-quim · 2022-04-28T18:50:23Z

Ofc, in presence of the pad things can be horribly complicated and clearly doesn't worth the trouble. But are all cubes padded? I thought that I could send cubes via Julia without padding (but would need to confirm that).

PaulWessel · 2022-04-28T18:57:12Z

We dont really have any thing that is truly 3-D in GMT so this is a kludge via stacks of grids, hence each grid has a pad, etc. have a look at how grdinterpolate operate or, better, what gmtapi_import_cube looks like and you will see.

joa-quim added the bug Something isn't working label Apr 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IO_NC4_CHUNK_SIZE must allow 3 values for when dealing with cubes. #6632

IO_NC4_CHUNK_SIZE must allow 3 values for when dealing with cubes. #6632

joa-quim commented Apr 26, 2022

PaulWessel commented Apr 27, 2022

joa-quim commented Apr 27, 2022 •

edited

Loading

PaulWessel commented Apr 27, 2022

joa-quim commented Apr 27, 2022

PaulWessel commented Apr 27, 2022

PaulWessel commented Apr 27, 2022

joa-quim commented Apr 27, 2022

PaulWessel commented Apr 27, 2022

PaulWessel commented Apr 28, 2022

joa-quim commented Apr 28, 2022

PaulWessel commented Apr 28, 2022

IO_NC4_CHUNK_SIZE must allow 3 values for when dealing with cubes. #6632

IO_NC4_CHUNK_SIZE must allow 3 values for when dealing with cubes. #6632

Comments

joa-quim commented Apr 26, 2022

PaulWessel commented Apr 27, 2022

joa-quim commented Apr 27, 2022 • edited Loading

PaulWessel commented Apr 27, 2022

joa-quim commented Apr 27, 2022

PaulWessel commented Apr 27, 2022

PaulWessel commented Apr 27, 2022

joa-quim commented Apr 27, 2022

PaulWessel commented Apr 27, 2022

PaulWessel commented Apr 28, 2022

joa-quim commented Apr 28, 2022

PaulWessel commented Apr 28, 2022

joa-quim commented Apr 27, 2022 •

edited

Loading