Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grdinfo on tiled files may be overkill #3767

Open
joa-quim opened this issue Jul 29, 2020 · 11 comments
Open

grdinfo on tiled files may be overkill #3767

joa-quim opened this issue Jul 29, 2020 · 11 comments
Labels
feature request Request a new feature

Comments

@joa-quim
Copy link
Member

For example, this one already take a lot of time and uses an extra 1 GB of mem


gmt grdinfo @earth_relief_01m
GMTAPI@-S-O-G-G-G-Y-000002: Title:
GMTAPI@-S-O-G-G-G-Y-000002: Command: grdblend @earth_relief_01m_p/ -R-180/180/-90/90 -I01m -rp -G@GMTAPI@-S-O-G-G-G-Y-000002 -Co+n

The problem is obviously because it has to assemble all the tiles to compute the info. A possible solution would be to store this info together with the files and add another cleverness to the reader to consult that file first if no -R is provided.

@PaulWessel
Copy link
Member

I agree that assembling tiles to get min max of entire dataset is silly. It should only happen with a non-global -R. So we would have to add zmin/zmaz to the gmt_data_server.txt file and that means 6.2 I think .

@PaulWessel
Copy link
Member

Another thought I had about tiles is this: If the user downloads all the tiles for a resolution, let grdblend make the single earth_relief_xxy.grd file and then we use that file instead (if it exists). This would eliminate lots of repeated grdblend calls, such as happens every time you run a script that makes a map. I think once the large file is there, it is simpler to just pass w/e/s/n in a read like we have always done. Should we pursue that scheme?

@joa-quim
Copy link
Member Author

joa-quim commented Jul 29, 2020

Yes, it's much better to have a single file that playing with the tiles all the time. For example

grdimage @earth -R...
grdcontour @earth -R

will launch a grdblend job twice, whilst with a single file it's just netcdf reading sub-regions.

@joa-quim joa-quim changed the title gmtinfo on tiled files may be overkill grdinfo on tiled files may be overkill Jul 29, 2020
@PaulWessel
Copy link
Member

We can update the gmt_server_data.txt with two more columns for zmin and zmax. However, this will break backwards compatibility, so I think we will need to do this (assume we may make other changes in the future);

  1. Starting with GMT 6.2 we will read a versioned file, say gmt_data_server_6.2.txt and the old gmt_data_server.txt will remain on the server for <= 6.1. This means when 6.3 or 7.1 comes out there will be new files. So many generations of files.
  2. Instead, we have a separate version number for the server stuff. So we let gmt_remote.h have a remove format version number, e.g., GMT_REMOTE_VERSION 2 and then GMT tries to use gmt_data_server_%d.txt via that version. If 6.2.1 needs futher changes then we increment to 3 and supply the new file.

I think that is better since we may not make more changes and then we dont have to add copies of files with 6.2. 6.3 etc if no changes.

Other solutions?

@seisman
Copy link
Member

seisman commented Jul 30, 2020

As for zmin and zmax, do we really need it? It's only useful for gmt grdinfo @earth_relief_01d, but not for any non-global grids.

@PaulWessel
Copy link
Member

Good point, that is true, and while @joa-quim may have forgotten the range of Earth's relief, most grade school kids know it goes from -11km to 8 km, ish. So perhaps it is not that helpful information to encode, given the trouble it imposes.

Perhaps the more helpful upgrade would be to reassemble the full global grids when the user gets all the tiles of a kind.

@seisman
Copy link
Member

seisman commented Jul 30, 2020

Perhaps the more helpful upgrade would be to reassemble the full global grids when the user gets all the tiles of a kind.

Yes, I think it's a good idea, but it means double the disk usage, and for earth relief 01s, it's a huge file (~40GB) (but will anyone plot a global map using 01s data?).

@PaulWessel
Copy link
Member

Well, right now there is no global tile set so even if you got all the SRTM1 tiles you still dont have 360x180 of them. So 1s and 3s will not change. It is more the intermediate such as 15s 30s and possibly 1m that are still too large to ship as a single item, but at least the 1m I can easily see be quickly downloaded via tiles and reach full coverage.

My thought (have not thought very long on this yet) would be that when gmt_assemble_grid is called on tiles, it could check if we have full coverage, and if we do then we launch a job to build the single global file. Not sure of details though: SHould we then delete the tiles if the user has earth_relief_30s.grd? Doing the blend takes some time for these large files, is the user willing to just wait for that? Or do we simply explain all this better and offer this via a gmt get option of modifier. Perhaps gmt get, which gets all the tiles of the selected resolutions, should simply build the single grid instead?

@seisman
Copy link
Member

seisman commented Jul 30, 2020

And also how to update the single global file if the data has a new release.

@joa-quim
Copy link
Member Author

First let me say that I found this because some julia function that automatically fishes the grid extents did a grdinfo @earth_relief_01m. Fortunately I was not using _01s.

Tiles are very convenient to work on small and very variable regions but are not so good to work with, specially if their number starts to grow. I think the best would be, if for any reason, including, accumulated usage some say 75% percent of the tiles end up in local machine, then better download them all and recreate the monolithic file and DELETE the tiles. gmt get should also have an option to do this.

When data has a new release, for the monolithic, files should NOT be updated automatically but a warning about the new version should be issued, together with the gmt get instruction to do so.

@PaulWessel
Copy link
Member

I will wait a bit on this to see if the NASA proposal gets funded which is were we proposed the tile mechanism that we now actually have started to implement.

@PaulWessel PaulWessel added the feature request Request a new feature label Aug 8, 2020
@seisman seisman added this to the 6.2.0 milestone Aug 8, 2020
@seisman seisman removed this from the 6.2.0 milestone Apr 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request a new feature
Projects
None yet
Development

No branches or pull requests

3 participants