Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new common -q option to select data rows #2341

Merged
merged 36 commits into from
Jan 1, 2020
Merged

Conversation

PaulWessel
Copy link
Member

@PaulWessel PaulWessel commented Dec 25, 2019

See issue #2288 for discussion. This PR implements -qi for all table input (ascii, binary, netcdf) in the low-level functions. Thus, modules can get this feature by simply adding q to THIS_MODULE_OPTIONS and add q to the relevant GMT_Option (API, "") strings.

For testing, made this file: gmt math -T0/10/1 T = t.txt

Try

gmt convert -q3:7 t.txt
gmt convert -q0:5,8: t.txt
gmt convert -q0:2:8 t.txt
gmt convert -q~4:6 t.txt

Then make a binary data set: gmt convert t.txt -bo2d > t.bin and run the same tests on t.bin while also passing -bi2d.

@PaulWessel
Copy link
Member Author

Gotten the +ccol part to work on input. Need to finalize how we count row numbers and how these are reset for new tables or new segments, which will need +f and +s modifiers, with a default of +a [rows in combined data set].
Test on a 9+ million record data set (256 Mb file] shows my -q implementation does not affect the run time in any noticeable way compared to master with no -q option.

PaulWessel added a commit that referenced this pull request Dec 28, 2019
For upcoming -q we need to count data record per data set, table, and segment for both input and output.  We did this for input only.  This standardizes and renames some of these variables.  The new output variables are not used yet in master but will be used in #2341
PaulWessel added a commit that referenced this pull request Dec 28, 2019
For upcoming -q we need to count data record per data set, table, and segment for both input and output.  We did this for input only.  This standardizes and renames some of these variables.  The new output variables are not used yet in master but will be used in #2341
@PaulWessel
Copy link
Member Author

Hi @joa-quim and @seisman; I think the -q[i|o][~]rows,...[+ccol][+a|f|s] is (close to?) operational. I have only added it to the parsing in gmtconvert and gmtmath for now.

I have done some basic testing with two input files with two segments each and the row-based checks work for all the cases I tried. Note you can list a series of ranges (e.g., -q2:6,8:12 gets rows 2-6 and 8-12) and you can also request the inverse by prepending ~ (e.g, -q~2:6,8:12 will get all rows except 2-6 and 8-12). The modifiers +a|f|s affect what row counter we are using. By default we count the data rows from 0 to the end of the entire data set. However, +f resets the counter at start of each new file, while +s resets the counter at the start of each segment. Finally, the +c option turns on data range selection instead (for column col). Here, +a|f|s are not valid since we are comparing data values, but ~ can be used as for row ranges. This seems pretty powerful to me so far. I made is robust so that for row-ranges I accept either -, :, or / as the delimiter, and if the start row is missing (e.g., :10) then we initialize start to 0, and if stop row is missing (e.g., 100-) then we initialize stop row to infinity. For data ranges I insist on slash-separated values since that is how our -R and -T options generally work, absolute time often have : and -, and even geographic data may have :, and of course they can be negative. So therefore I only accept /. The multiple data ranges, negation and missing start or stop values works for data ranges as well as for row ranges.

As for speed I ran the following test:

gmt grd2xyz @earth_relief_05m > junk.txt
# First just run via convert with no option so we can compare speed with master
time gmt convert junk.txt > shit1.txt
# Next just run via convert -qi~0 to skip the very first record
time gmt convert junk.txt -qi~0 > shit2.txt
# Next just run via convert -qi~89/+c1 to skip all latitudes > 89
time gmt convert junk.txt -qi~89/+c1 > shit3.txt

I ran these three times and compared the run times and there was only a 2-3% variations in time (all around 14-15 seconds for me). When I ran the first gmtconvert with master (i.e., no -q if-test being checked) it took slightly longer - which I don't understand - see what you get. The input file is ~250 Mb with > 9 million records so the slowdown, if any, seems trivial. An earlier test now lost gave a slightly faster master version compared to the branch that has -q (but not used on the command line); it seems of no big consequence to accept the -q implementation as it is for now.

If you agree with all of this then I would like to add q to the THIS_MODULE_OPTIONS of all the table readers/writers as well as to the explanations; the rest is automatic. Just like -b I will need to have separate explain files for -qi and -qo since many modules can only accept one form (e.g., grd2xyz can only accept -qo, for instance, while xyz2grd only can do -qi).

@seisman
Copy link
Member

seisman commented Dec 30, 2019

I followed your first comment and ran the commands below:

gmt math -T0/10/1 T = t.txt
gmt convert t.txt -bo2d > t.bin
gmt convert -q3:7 t.bin -bi2d -V

It gave me some warnings and no output:

gmtconvert [WARNING]: File t.bin is empty!
gmtconvert [WARNING]: No data records provided

doc/rst/source/explain_-q.rst_ Outdated Show resolved Hide resolved
doc/rst/source/explain_-q_full.rst_ Outdated Show resolved Hide resolved
@PaulWessel
Copy link
Member Author

Not sure how this happened as I was running that test multiple times...Now passes for me.

Copy link
Member

@seisman seisman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@joa-quim
Copy link
Member

Sorry, hadn't have time to test it, visitors, but looks good.

@PaulWessel
Copy link
Member Author

Hi @joa-quim or @seisman, would you know what I am missing from these messages (if so please fix):

[8/8] Building HTML documentation...
source/common_SYN_OPTs.rst_:55: WARNING: undefined label: -qi_full (if the link has no caption the label must precede a section header)
source/explain_-qi.rst_:2: WARNING: undefined label: -qi_full (if the link has no caption the label must precede a section header)
source/common_SYN_OPTs.rst_:55: WARNING: undefined label: -qi_full (if the link has no caption the label must precede a section header)
source/explain_-qi.rst_:2: WARNING: undefined label: -qi_full (if the link has no caption the label must precede a section header)
source/common_SYN_OPTs.rst_:55: WARNING: undefined label: -qi_full (if the link has no caption the label must precede a section header)
source/explain_-qi.rst_:2: WARNING: undefined label: -qi_full (if the link has no caption the label must precede a section header)
source/common_SYN_OPTs.rst_:55: WARNING: undefined label: -qi_full (if the link has no caption the label must precede a section header)
source/explain_-qi.rst_:2: WARNING: undefined label: -qi_full (if the link has no caption the label must precede a section header)

@PaulWessel
Copy link
Member Author

Basically, the (...more) links for -qi and qo dont work.

@joa-quim
Copy link
Member

You need to add them to gmt.rst

@PaulWessel
Copy link
Member Author

Thanks, done.

@PaulWessel
Copy link
Member Author

OK, all those warnings went away but the docs for xyz2grd -qiflags still have no links like diflags etc.

@PaulWessel
Copy link
Member Author

Worked better after wiping the whole build tree. Looks good. I will do some more testing before merging this branch back in.

@PaulWessel PaulWessel changed the title WIP Add new common -q option to select rows Add new common -q option to select data rows Dec 31, 2019
@PaulWessel
Copy link
Member Author

Hi @joa-quim and @seisman. I have added tests for both ascii and binary files and have also made sure it works with netCDF tables (I may add a test for this as well). The only thing I have not yet tested is if it works from the external interfaces. I don't think that should prevent merging this branch back to master though since -q is less likely to be used due to subscripting.

@seisman
Copy link
Member

seisman commented Dec 31, 2019

Just tried the new -q option with pygmt. It works well:

import pygmt
fig = pygmt.Figure()
fig.plot(projection="X10c", region="0/10/0/10", style="c0.2c", frame=True, data="t.txt", q="2:5")
fig.savefig("map.png")

@PaulWessel
Copy link
Member Author

Probably will fail if you pass it via memory instead of a file.

@PaulWessel
Copy link
Member Author

I have added a warning that -q is not implemented for external memory objects, so if the code enters those cases under api_export|import_dataset and -q is set then we print that warning. With that I think I am good to merge unless you stop me.

@PaulWessel PaulWessel merged commit f75bb66 into master Jan 1, 2020
@PaulWessel PaulWessel deleted the common-q-option branch January 1, 2020 04:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants