-
Notifications
You must be signed in to change notification settings - Fork 343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new common -q option to select data rows #2341
Conversation
Seems to work so far.
Only gmtconvert for now, but can be added to any module.
This handles absolute time data as well as any other type.
Gotten the +ccol part to work on input. Need to finalize how we count row numbers and how these are reset for new tables or new segments, which will need +f and +s modifiers, with a default of +a [rows in combined data set]. |
For upcoming -q we need to count data record per data set, table, and segment for both input and output. We did this for input only. This standardizes and renames some of these variables. The new output variables are not used yet in master but will be used in #2341
For upcoming -q we need to count data record per data set, table, and segment for both input and output. We did this for input only. This standardizes and renames some of these variables. The new output variables are not used yet in master but will be used in #2341
Current row counter will be reset for a new file or a new segment.
Hi @joa-quim and @seisman; I think the -q[i|o][~]rows,...[+ccol][+a|f|s] is (close to?) operational. I have only added it to the parsing in gmtconvert and gmtmath for now. I have done some basic testing with two input files with two segments each and the row-based checks work for all the cases I tried. Note you can list a series of ranges (e.g., -q2:6,8:12 gets rows 2-6 and 8-12) and you can also request the inverse by prepending ~ (e.g, -q~2:6,8:12 will get all rows except 2-6 and 8-12). The modifiers +a|f|s affect what row counter we are using. By default we count the data rows from 0 to the end of the entire data set. However, +f resets the counter at start of each new file, while +s resets the counter at the start of each segment. Finally, the +c option turns on data range selection instead (for column col). Here, +a|f|s are not valid since we are comparing data values, but ~ can be used as for row ranges. This seems pretty powerful to me so far. I made is robust so that for row-ranges I accept either -, :, or / as the delimiter, and if the start row is missing (e.g., :10) then we initialize start to 0, and if stop row is missing (e.g., 100-) then we initialize stop row to infinity. For data ranges I insist on slash-separated values since that is how our -R and -T options generally work, absolute time often have : and -, and even geographic data may have :, and of course they can be negative. So therefore I only accept /. The multiple data ranges, negation and missing start or stop values works for data ranges as well as for row ranges. As for speed I ran the following test:
I ran these three times and compared the run times and there was only a 2-3% variations in time (all around 14-15 seconds for me). When I ran the first gmtconvert with master (i.e., no -q if-test being checked) it took slightly longer - which I don't understand - see what you get. The input file is ~250 Mb with > 9 million records so the slowdown, if any, seems trivial. An earlier test now lost gave a slightly faster master version compared to the branch that has -q (but not used on the command line); it seems of no big consequence to accept the -q implementation as it is for now. If you agree with all of this then I would like to add q to the THIS_MODULE_OPTIONS of all the table readers/writers as well as to the explanations; the rest is automatic. Just like -b I will need to have separate explain files for -qi and -qo since many modules can only accept one form (e.g., grd2xyz can only accept -qo, for instance, while xyz2grd only can do -qi). |
I followed your first comment and ran the commands below:
It gave me some warnings and no output:
|
Co-Authored-By: Dongdong Tian <[email protected]>
Not sure how this happened as I was running that test multiple times...Now passes for me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
Sorry, hadn't have time to test it, visitors, but looks good. |
Hi @joa-quim or @seisman, would you know what I am missing from these messages (if so please fix):
|
Basically, the (...more) links for -qi and qo dont work. |
You need to add them to |
Thanks, done. |
OK, all those warnings went away but the docs for xyz2grd -qiflags still have no links like diflags etc. |
Worked better after wiping the whole build tree. Looks good. I will do some more testing before merging this branch back in. |
Hi @joa-quim and @seisman. I have added tests for both ascii and binary files and have also made sure it works with netCDF tables (I may add a test for this as well). The only thing I have not yet tested is if it works from the external interfaces. I don't think that should prevent merging this branch back to master though since -q is less likely to be used due to subscripting. |
Just tried the new -q option with pygmt. It works well:
|
Probably will fail if you pass it via memory instead of a file. |
I have added a warning that -q is not implemented for external memory objects, so if the code enters those cases under api_export|import_dataset and -q is set then we print that warning. With that I think I am good to merge unless you stop me. |
See issue #2288 for discussion. This PR implements -qi for all table input (ascii, binary, netcdf) in the low-level functions. Thus, modules can get this feature by simply adding q to THIS_MODULE_OPTIONS and add q to the relevant GMT_Option (API, "") strings.
For testing, made this file:
gmt math -T0/10/1 T = t.txt
Try
Then make a binary data set:
gmt convert t.txt -bo2d > t.bin
and run the same tests on t.bin while also passing -bi2d.