Add new common -q option to select data rows #2341

PaulWessel · 2019-12-25T09:38:44Z

See issue #2288 for discussion. This PR implements -qi for all table input (ascii, binary, netcdf) in the low-level functions. Thus, modules can get this feature by simply adding q to THIS_MODULE_OPTIONS and add q to the relevant GMT_Option (API, "") strings.

For testing, made this file: gmt math -T0/10/1 T = t.txt

Try

gmt convert -q3:7 t.txt
gmt convert -q0:5,8: t.txt
gmt convert -q0:2:8 t.txt
gmt convert -q~4:6 t.txt

Then make a binary data set: gmt convert t.txt -bo2d > t.bin and run the same tests on t.bin while also passing -bi2d.

Seems to work so far.

Only gmtconvert for now, but can be added to any module.

This handles absolute time data as well as any other type.

PaulWessel · 2019-12-27T08:18:59Z

Gotten the +ccol part to work on input. Need to finalize how we count row numbers and how these are reset for new tables or new segments, which will need +f and +s modifiers, with a default of +a [rows in combined data set].
Test on a 9+ million record data set (256 Mb file] shows my -q implementation does not affect the run time in any noticeable way compared to master with no -q option.

For upcoming -q we need to count data record per data set, table, and segment for both input and output. We did this for input only. This standardizes and renames some of these variables. The new output variables are not used yet in master but will be used in #2341

Current row counter will be reset for a new file or a new segment.

PaulWessel · 2019-12-30T05:33:44Z

Hi @joa-quim and @seisman; I think the -q[i|o][~]rows,...[+ccol][+a|f|s] is (close to?) operational. I have only added it to the parsing in gmtconvert and gmtmath for now.

I have done some basic testing with two input files with two segments each and the row-based checks work for all the cases I tried. Note you can list a series of ranges (e.g., -q2:6,8:12 gets rows 2-6 and 8-12) and you can also request the inverse by prepending ~ (e.g, -q~2:6,8:12 will get all rows except 2-6 and 8-12). The modifiers +a|f|s affect what row counter we are using. By default we count the data rows from 0 to the end of the entire data set. However, +f resets the counter at start of each new file, while +s resets the counter at the start of each segment. Finally, the +c option turns on data range selection instead (for column col). Here, +a|f|s are not valid since we are comparing data values, but ~ can be used as for row ranges. This seems pretty powerful to me so far. I made is robust so that for row-ranges I accept either -, :, or / as the delimiter, and if the start row is missing (e.g., :10) then we initialize start to 0, and if stop row is missing (e.g., 100-) then we initialize stop row to infinity. For data ranges I insist on slash-separated values since that is how our -R and -T options generally work, absolute time often have : and -, and even geographic data may have :, and of course they can be negative. So therefore I only accept /. The multiple data ranges, negation and missing start or stop values works for data ranges as well as for row ranges.

As for speed I ran the following test:

gmt grd2xyz @earth_relief_05m > junk.txt
# First just run via convert with no option so we can compare speed with master
time gmt convert junk.txt > shit1.txt
# Next just run via convert -qi~0 to skip the very first record
time gmt convert junk.txt -qi~0 > shit2.txt
# Next just run via convert -qi~89/+c1 to skip all latitudes > 89
time gmt convert junk.txt -qi~89/+c1 > shit3.txt

I ran these three times and compared the run times and there was only a 2-3% variations in time (all around 14-15 seconds for me). When I ran the first gmtconvert with master (i.e., no -q if-test being checked) it took slightly longer - which I don't understand - see what you get. The input file is ~250 Mb with > 9 million records so the slowdown, if any, seems trivial. An earlier test now lost gave a slightly faster master version compared to the branch that has -q (but not used on the command line); it seems of no big consequence to accept the -q implementation as it is for now.

If you agree with all of this then I would like to add q to the THIS_MODULE_OPTIONS of all the table readers/writers as well as to the explanations; the rest is automatic. Just like -b I will need to have separate explain files for -qi and -qo since many modules can only accept one form (e.g., grd2xyz can only accept -qo, for instance, while xyz2grd only can do -qi).

seisman · 2019-12-30T06:06:15Z

I followed your first comment and ran the commands below:

gmt math -T0/10/1 T = t.txt
gmt convert t.txt -bo2d > t.bin
gmt convert -q3:7 t.bin -bi2d -V

It gave me some warnings and no output:

gmtconvert [WARNING]: File t.bin is empty!
gmtconvert [WARNING]: No data records provided

doc/rst/source/explain_-q.rst_

doc/rst/source/explain_-q_full.rst_

Co-Authored-By: Dongdong Tian <[email protected]>

PaulWessel · 2019-12-30T08:10:34Z

Not sure how this happened as I was running that test multiple times...Now passes for me.

seisman

Looks good to me.

joa-quim · 2019-12-30T20:13:30Z

Sorry, hadn't have time to test it, visitors, but looks good.

PaulWessel · 2019-12-30T23:20:27Z

Hi @joa-quim or @seisman, would you know what I am missing from these messages (if so please fix):

[8/8] Building HTML documentation...
source/common_SYN_OPTs.rst_:55: WARNING: undefined label: -qi_full (if the link has no caption the label must precede a section header)
source/explain_-qi.rst_:2: WARNING: undefined label: -qi_full (if the link has no caption the label must precede a section header)
source/common_SYN_OPTs.rst_:55: WARNING: undefined label: -qi_full (if the link has no caption the label must precede a section header)
source/explain_-qi.rst_:2: WARNING: undefined label: -qi_full (if the link has no caption the label must precede a section header)
source/common_SYN_OPTs.rst_:55: WARNING: undefined label: -qi_full (if the link has no caption the label must precede a section header)
source/explain_-qi.rst_:2: WARNING: undefined label: -qi_full (if the link has no caption the label must precede a section header)
source/common_SYN_OPTs.rst_:55: WARNING: undefined label: -qi_full (if the link has no caption the label must precede a section header)
source/explain_-qi.rst_:2: WARNING: undefined label: -qi_full (if the link has no caption the label must precede a section header)

PaulWessel · 2019-12-30T23:24:03Z

Basically, the (...more) links for -qi and qo dont work.

joa-quim · 2019-12-31T00:48:55Z

You need to add them to gmt.rst

PaulWessel · 2019-12-31T00:53:12Z

Thanks, done.

PaulWessel · 2019-12-31T00:54:30Z

OK, all those warnings went away but the docs for xyz2grd -qiflags still have no links like diflags etc.

PaulWessel · 2019-12-31T01:29:26Z

Worked better after wiping the whole build tree. Looks good. I will do some more testing before merging this branch back in.

PaulWessel · 2019-12-31T04:28:15Z

Hi @joa-quim and @seisman. I have added tests for both ascii and binary files and have also made sure it works with netCDF tables (I may add a test for this as well). The only thing I have not yet tested is if it works from the external interfaces. I don't think that should prevent merging this branch back to master though since -q is less likely to be used due to subscripting.

seisman · 2019-12-31T06:09:57Z

Just tried the new -q option with pygmt. It works well:

import pygmt
fig = pygmt.Figure()
fig.plot(projection="X10c", region="0/10/0/10", style="c0.2c", frame=True, data="t.txt", q="2:5")
fig.savefig("map.png")

PaulWessel · 2019-12-31T08:55:32Z

Probably will fail if you pass it via memory instead of a file.

PaulWessel · 2019-12-31T20:48:58Z

I have added a warning that -q is not implemented for external memory objects, so if the code enters those cases under api_export|import_dataset and -q is set then we print that warning. With that I think I am good to merge unless you stop me.

PaulWessel added 2 commits December 24, 2019 13:37

Initial implementation and test in gmtconvert

5c2a604

Seems to work so far.

Full implementation of -qi for ascii, binary and tested in gmtconvert

930760a

Only gmtconvert for now, but can be added to any module.

PaulWessel requested review from joa-quim and seisman December 25, 2019 09:38

PaulWessel mentioned this pull request Dec 26, 2019

Ability to limit input rows and even row time-range #2288

Closed

PaulWessel added 4 commits December 26, 2019 18:30

Implement -q+ccol as well for input

a418fd3

This handles absolute time data as well as any other type.

Add a few examples to the documentation

b3b7908

Working on various record counters

2d8ed28

Fix where we skip rows.

488eb63

PaulWessel added 2 commits December 27, 2019 15:28

move init of rec_no pointer to gmtlib_io_init

16d22ac

Merge branch 'master' into common-q-option

2172bef

PaulWessel mentioned this pull request Dec 28, 2019

Add data record counters for both input and output #2350

Merged

PaulWessel added 11 commits December 28, 2019 08:51

Merge branch 'master' into common-q-option

d5f689f

Update gmt_io.c

48e31f8

Merge branch 'master' into common-q-option

9af0d3a

Merge branch 'master' into common-q-option

cea874c

Add +a|f|s modifier for different row counts

7ff6b74

Current row counter will be reset for a new file or a new segment.

Allow reverse of multiple ranges

a6e1525

Stronger syntax check for -q

47541bc

The -qo works for ascii tables

1b66850

Improve checking for output data range

9415273

Test -q on gmtmath

2ed834f

Strengthen the parsing of [start]:[inc:][stop]

3ef8a95

seisman reviewed Dec 30, 2019

View reviewed changes

doc/rst/source/explain_-q.rst_ Outdated Show resolved Hide resolved

doc/rst/source/explain_-q_full.rst_ Outdated Show resolved Hide resolved

Update doc/rst/source/explain_-q.rst_

d76a5a7

Co-Authored-By: Dongdong Tian <[email protected]>

seisman approved these changes Dec 30, 2019

View reviewed changes

PaulWessel added 2 commits December 30, 2019 12:38

Extend -q to all relevant modules and their documentation

91b8834

Fix typos

a6be685

Better alignment of synopsis after -q was fit in

67d606d

Update gmt.rst

8dfaf49

PaulWessel added 5 commits December 30, 2019 16:30

Clear up data_range parsing and add a test for gmtconvert

4417254

Fix binary rec counting and add binary test script

972a378

Fix nc reading with -q

7432a7b

Explain -q in cookbook

11cae5f

Update options.rst

171fbc4

PaulWessel changed the title ~~WIP Add new common -q option to select rows~~ Add new common -q option to select data rows Dec 31, 2019

Give warning that -q is not implemented for external memory transactions

73b49eb

PaulWessel added 4 commits December 31, 2019 11:05

Merge branch 'master' into common-q-option

75e5365

Fix some typos while at it

26211bc

Merge branch 'master' into common-q-option

83ea992

Merge branch 'master' into common-q-option

d4c848a

PaulWessel merged commit f75bb66 into master Jan 1, 2020

PaulWessel deleted the common-q-option branch January 1, 2020 04:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new common -q option to select data rows #2341

Add new common -q option to select data rows #2341

PaulWessel commented Dec 25, 2019 •

edited

Loading

PaulWessel commented Dec 27, 2019

PaulWessel commented Dec 30, 2019

seisman commented Dec 30, 2019

PaulWessel commented Dec 30, 2019

seisman left a comment

joa-quim commented Dec 30, 2019

PaulWessel commented Dec 30, 2019

PaulWessel commented Dec 30, 2019

joa-quim commented Dec 31, 2019

PaulWessel commented Dec 31, 2019

PaulWessel commented Dec 31, 2019

PaulWessel commented Dec 31, 2019

PaulWessel commented Dec 31, 2019

seisman commented Dec 31, 2019

PaulWessel commented Dec 31, 2019

PaulWessel commented Dec 31, 2019

Add new common -q option to select data rows #2341

Add new common -q option to select data rows #2341

Conversation

PaulWessel commented Dec 25, 2019 • edited Loading

PaulWessel commented Dec 27, 2019

PaulWessel commented Dec 30, 2019

seisman commented Dec 30, 2019

PaulWessel commented Dec 30, 2019

seisman left a comment

Choose a reason for hiding this comment

joa-quim commented Dec 30, 2019

PaulWessel commented Dec 30, 2019

PaulWessel commented Dec 30, 2019

joa-quim commented Dec 31, 2019

PaulWessel commented Dec 31, 2019

PaulWessel commented Dec 31, 2019

PaulWessel commented Dec 31, 2019

PaulWessel commented Dec 31, 2019

seisman commented Dec 31, 2019

PaulWessel commented Dec 31, 2019

PaulWessel commented Dec 31, 2019

PaulWessel commented Dec 25, 2019 •

edited

Loading