Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update GMT_IS_REFERENCE and GMT_IS_DUPLICATE to allow duplicating strings #3718

Merged
merged 21 commits into from
Jul 30, 2020

Conversation

seisman
Copy link
Member

@seisman seisman commented Jul 24, 2020

Description of proposed changes

The C code src/testapi_vector_strings2.c is modified from the test src/testapi_vector_strings.c.

The two C codes are very similar, except that in src/testapi_vector_strings2.c:

  • the string array strings are dynamically allocated using malloc (line 28)
  • the string array are freed after calling GMT_Put_Strings (line 45).

Running the C code gives me:

image

If the string array are not freed (i.e., comment out lines 44-46), I can get the correct result:

image

@PaulWessel Is it the expected behaviour or a GMT bug?

@seisman seisman requested a review from PaulWessel July 24, 2020 04:25
@PaulWessel
Copy link
Member

Hm. Let's see. So in effect, the default is GMT_IS_REFERENCE. If we wanted to have a different behavior (GMT_IS_DUPLICATE) then we would need a way to say so. If we know that the pointer passed in is an array of char* pointers then we can do the duplication inside since we know the number of rows = number of strings. So the options are:

  1. Add a constant to family. Now, GMT_Put_Strings is only used for GMT_IS_MATRIX (5) and GMT_IS_VECTOR (6), so if we added a |GMT_IS_DUPLICATE (3) we would get 8 or 9. But if we add GMT_IS_REFERENCE (4) we get 9 or 10, so we could not tell a matrix reference from a vector duplicate. We would have to change the assignments to these two constants, but I think that messes with Julia which duplicates these settings. So on Julia (I think), GMT_IS_DUPLICATE is basically a hard-wired number even though anyone using the C API and its named constants could care less what those values are.
  2. Introduce a new constant GMT_DUP_STRINGS of some value like 16 and then we can tell. Adding a new constant to gmt_resources.h is not adding things to a structure so it may not be a problem for @joa-quim?

So it is a bit problematic. If this was just a C API that other people used then I could just change the values assigned to GMT_IS_REFERENCE and GMT_IS_DUPLICATE. I need @joa-quim to comment on that. I don't like the idea that Julia would dictate what we can do in the C API. The whole point of using named constants is to avoid having to change a hardwired 4 to 40 down the road since the code only uses the named constant. If I understand how @joa-quim builds Julia he started with a version of gmt_resources.h some time ago and it is not recreated from a changing gmt source. For PyGMT, @leouieda made me add the GMT_Get_Enum function so you could pass the name and get the value; ideally Julia could do the same. So let's wait until tomorrow to see what we can do here.

@joa-quim
Copy link
Member

joa-quim commented Jul 24, 2020

I can use GMT_Get_Enum() to get the GMT_IS_REFERENCE value. Generalizing this to all the enums used in Julia as Const is not so obvious.

@PaulWessel
Copy link
Member

Just so we understand: Your Julia code is using the constants like GMT_IS_DUPLICATE, GMT_IS DATASET, etc and their values are fixed values. if so, presumably adding GMT_DUP_STRINGS wont hurt anything.

@joa-quim
Copy link
Member

Adding a new enum won't hurt anything. Only if it changes old functionality. But you can change the GMT_IS_REFERENCE value.

@PaulWessel
Copy link
Member

How about the GMT_IS_DUPLICATE value? That would be ideal since having to add another enum with the same meaning is not good style.

But in general, I think you need to consider a strategy where you do not impose conditions on such changes in the GMT API. At the very least, there should be some versioning since changing the number assigned to an enum or constant is not considered breaking backwards compatibility. I understand it does in fact affect the Julia implementation but that is an issue with that implementation and not the C API, no? You could, for example, import all those enums via GMT_Get_Enum in your startup.

@joa-quim
Copy link
Member

I don't use GMT_IS_DUPLICATE. I try to use the minimum enums as possible.
What you mean by some versioning? I already have lots of if (GMTver ...)

I told you, it's not easy to import the enums. I can only use GMT_Get_Enum after the file with the functions declarations is imported and the gmt_main(), which starts the API`` starts it. But some of the functions wrappers already need the (some) enums.

@PaulWessel
Copy link
Member

What you mean by some versioning? I already have lots of if (GMTver ...)

I mean, GMT.jl version X requires GMT version Y. It should not aim to work with all possible GMT versions.

I told you, it's not easy to import the enums. I can only use GMT_Get_Enum after the file with the functions declarations is imported and the gmt_main(), which starts the API`` starts it. But some of the functions wrappers already need the (some) enums.

OK I see, so you really need a process that runs script makeJL and it builds the porting of gmt_resources.h etc from the C. I think you said you did something like that but not final so took lots of manual edits after that, which is why you don't like to do this again. But perhaps we can help with coming up with such a solution?

@joa-quim
Copy link
Member

I intend to drop GMT5 support soon, but I think not supporting 6.X is a killing feature.

Recreating the the wrapper functions is a complex operation. It involves installing a package that depends on LLVM that scans lots of headers and creates wrappers to lots of functions. Than those must be cleaned. No way I want to repeat this process without absolute need.

@joa-quim
Copy link
Member

And can't currently trust in GMT_Get_Enum() either. At least these ones return -99999

GMT_IS_IMAGE = -99999
GMT_IS_POSTSCRIPT = -99999
GMT_IS_TEXTSET = -99999
GMT_MODULE_OPT = -99999
GMT_GRID_ALL = -99999

@seisman
Copy link
Member Author

seisman commented Jul 26, 2020

And can't currently trust in GMT_Get_Enum() either. At least these ones return -99999

GMT_IS_IMAGE = -99999
GMT_IS_POSTSCRIPT = -99999
GMT_IS_TEXTSET = -99999
GMT_MODULE_OPT = -99999
GMT_GRID_ALL = -99999

These values also can't be accessed in PyGMT, but others like GMT_IS_DATASET, GMT_IS_GRID, can.

@seisman
Copy link
Member Author

seisman commented Jul 26, 2020

It's because these enums are not included in gmt_enum_dict.h. Perhaps a bug of gmt_make_enum_dicts.sh?

@PaulWessel
Copy link
Member

yes looks like it. Weill have a look after last morning swim before impending hurricane... Note that it is likely there will be power-outages in Hawaii and the GMT server will go down at some point.

GMT_Put_Strings only worked by reference.  Now, we can accept a family argument such as GMT_IS_VECTOR|GMT_IS_DUPLICATE (or GMT_IS_REFERENCE which is the default) and it will duplicate the array string if requested.
@PaulWessel
Copy link
Member

Hi @seisman, I think this ought to work but I am unable to run tests with the gmtserver down (they are looking into it as no power outage yet). Since I changed the values of GMT_IS_DUPLICATE and GMT_IS_REFERENCE we need to run the tests to see if I forgot some case. It seems the only issues was the use of these are array indices into GMT_method for debug print statements so I changed that. The rest of the use of GMT_IS_DUPLICATE should be fine. There are more use of GMT_IS_REFERENCE in modules so need to run all tests to know. Oh, and I did not actually run your C code...

@seisman
Copy link
Member Author

seisman commented Jul 27, 2020

I manually ran the gmt_make_enum_dicts.sh to update gmt_enum_dict.h. After that, I tried GMT_IN, GMT_IN|GMT_IS_REFERENCE and GMT_IN|GMT_IS_DUPLICATE in the C test code, but none of them work.

@PaulWessel
Copy link
Member

You mean in the test C program? I think the argument should be GMT_IS_VECTOR|GMT_IS_DUPLICATE, no? E.g.,

GMT_Put_Strings(API, GMT_IS_VECTOR|GMT_IS_DUPLICATE, V, strings);

@seisman
Copy link
Member Author

seisman commented Jul 27, 2020

You mean in the test C program? I think the argument should be GMT_IS_VECTOR|GMT_IS_DUPLICATE, no? E.g.,

GMT_Put_Strings(API, GMT_IS_VECTOR|GMT_IS_DUPLICATE, V, strings);

You're right. GMT_IS_VECTOR|GMT_IS_DUPLICATE works, but GMT_IS_VECTOR|GMT_IS_REFERENCE gives errors. I think it's the expected behavior, right?

The PR looks good, but I didn't run the full test, due to the server down. Just two more questions:

  1. When I ran testapi_vector_strings2, it tries to download the gmt_data_server.txt file from the server and timeout due to the server down. The C test code doesn't use any remote files. I thought it shouldn't download any files from the server after PR Try to refresh only when remote files are requested #3616. Why does it still try to download gmt_data_server.txt? When I type Ctrl+C, I saw the message
    Emergency removal of file xxxx due to Ctrl-C action hundreds of times. It never stops, and Ctrl+C doesn't work. So I have to close the terminal to end it.

  2. I'm a little confused about the behavior of GMT_Put_Vector and GMT_Put_Strings. What will happen if the numeric vectors passed to GMT_Put_Vector are freed?

@PaulWessel
Copy link
Member

Good questions, so:

  1. Regular gmt modules check if the arguments contain any remote file, and if so they do the check on gmt_data_server freshness. This happens in gmt_init_module. The test_*.c programs do not call gmt_init_module. That is the only function that calls gmt_refresh_server. The only other places is in gmtget.c. So I guess I don't understand how it is trying this? You are sure it is doing that?
  2. It is the same issue: GMT_Put_Vector was created to let users stick in their vectors, but it was not set up to duplicate them, just like the Strings. So if there is a need to duplicate we have the same situation that started this thread.

Given all that, it may possibly be cleaner to add three more functions:

GMT_Duplicate_Vector
GMT_Duplicate_Matrix
GMT_Duplicate_Strings

unless we are to add |GMT_IS_DUPLICATE to the type argument in GMT_Put_Vector

@PaulWessel
Copy link
Member

I could define DUP and REF as 16 and 32 and then you could pass GMT_DOUBLE|GMT_IS_DUPLICATE to Put_Vector and we could do the duplication. Perhaps that is an OK solution as well?

@PaulWessel
Copy link
Member

Forgot to answer the failiure re GMT_IS_REFERENCE. If by failure you mean the same as the first plots then yes, if you pass by reference and then remote the information it will fail by design. So I am thinking it may make more sense to pass by REF and not free what you created temporarily anyways since you are just forcing the duplication to happen again anyway.

@seisman
Copy link
Member Author

seisman commented Jul 27, 2020

  1. Regular gmt modules check if the arguments contain any remote file, and if so they do the check on gmt_data_server freshness. This happens in gmt_init_module. The test_*.c programs do not call gmt_init_module. That is the only function that calls gmt_refresh_server. The only other places is in gmtget.c. So I guess I don't understand how it is trying this? You are sure it is doing that?

I add -Vd to the text call and it takes ~20 seconds to show the debug message. When I type Ctrl+C, it shows following message. It seems it's downloading the remote file

./testapi_vector_strings2 > map.ps
^CEmergency removal of file /Users/seisman/.gmt/server/gmt_data_server.txt due to Ctrl-C action
  1. It is the same issue: GMT_Put_Vector was created to let users stick in their vectors, but it was not set up to duplicate them, just like the Strings. So if there is a need to duplicate we have the same situation that started this thread.

Given all that, it may possibly be cleaner to add three more functions:

GMT_Duplicate_Vector
GMT_Duplicate_Matrix
GMT_Duplicate_Strings

unless we are to add |GMT_IS_DUPLICATE to the type argument in GMT_Put_Vector

I'm not sure if it's necessary. At least it works well from the PyGMT side, most likely because the Python numeric arrays are not freed, but string arrays are.

I could define DUP and REF as 16 and 32 and then you could pass GMT_DOUBLE|GMT_IS_DUPLICATE to Put_Vector and we could do the duplication. Perhaps that is an OK solution as well?

It's up to you. I can test it from the PyGMT side, but as mentioned above, GMT_DOUBLE now works well for PyGMT.

@PaulWessel
Copy link
Member

Sorry, did not think of the text call. Good one, I can see why now. I will fix this separately outside this branch.

@PaulWessel
Copy link
Member

I will adjust the values of DUP and REF so that there is room to enhance GMT_Put_* if we feel like it.

@PaulWessel
Copy link
Member

yes, taht is a good idea. Please give it a spin, @joa-quim so we dont break anything. This PR changed the values of GMT_IS_REFERENCE|DUPLICATE and fixed some issues that surfaced because if it.

@seisman seisman changed the title WIP: GMT_Put_Strings doesn't work if the original string array is freed? Update GMT_IS_REFERENCE and GMT_IS_DUPLICATE to allow duplicating strings Jul 29, 2020
@seisman
Copy link
Member Author

seisman commented Jul 29, 2020

Also backport this PR to 6.1? Some PyGMT features needs this PR, and we don't want to wait for the 6.2.0 release.

@PaulWessel
Copy link
Member

PaulWessel commented Jul 29, 2020 via email

@seisman
Copy link
Member Author

seisman commented Jul 29, 2020

Do you mean backporting PR #3735? I'm talking about this branch.

@PaulWessel
Copy link
Member

I meant this branch put-strings.

@seisman seisman added the backport 6.1 Backport this PR to 6.1 branch label Jul 29, 2020
@seisman
Copy link
Member Author

seisman commented Jul 29, 2020

"backport 6.1" label added. FYI, we can even add the label after a PR is merged.

@joa-quim
Copy link
Member

Tested GMT.jl and no problems.

@joa-quim
Copy link
Member

joa-quim commented Jul 29, 2020

but the MEX

gmtmex('grdcut @earth_relief_01m -R-37.0/-35.0/-2.3/-1.0')
Error using gmtmex
GMT: Failure to open virtual file

From the mastar branch the above crashes Matlab as it also crashes Julia. The problem seem to be that the returned grids is junk.

@PaulWessel
Copy link
Member

Just to be clear, you tested GMT.jl and no problems, but then you discovered the above example which crashes on both MEX and Julia, yes?

@joa-quim
Copy link
Member

Right.

@PaulWessel
Copy link
Member

Thanks, will have a look. Also time for me to remind @joa-quim and @seisman about Debugging. It would be nice to have information there for how to debug GMT when called from Julia, MEX, and PyGMT, for 3 platforms. IF we want people to help find bugs then we need more people to actually step into the code. The biggest problem is setting up a working GUI debugging system. I can add stuff for debugging MEX on macos, for instance, but I do not know how for Julia and PyGMT. Since one uses Xcode on macOS I think it is possibly to do the same for Julia and Python but have not tried.

@PaulWessel
Copy link
Member

Good but hard on the brain to debug this. We end up in grdblend of course, writing a grid back to grdcut who writes the grid back to gmtmex. So the actual grid is passed up via two modules. Unfortunately, looks like the original object does not release ownership so at the last minute we wipe that grid which is the same pointer taht we pass back to mex, so it is NULL there...
Building the put_strings branch and debug there since it is a bit clearer with the methods.

@PaulWessel
Copy link
Member

So I have the solution to this, @seisman and @seisman . When we call the gmt_assemble_grid function inside GMT_Read_Data, we already have an object container (API->object[ID]) to hold the grid to be read. But then gmt_assemble_grid also creates its own container via GMT_Open_VirtualFile. Once we get the grid we actually need to delete that VirtualFile (and its container) since we are done. Otherwise it lives on in the API object list and causes trouble like above.

I can do this in a few different ways:

  1. Add new API function GMT_Delete_VirtualFile. This is what I did to test this. Works, adds one more API function.
  2. Since most likely will only be needed in this assemble case, make it gmtlib_delete_virtualfile instead. Not in the API but in the GMT lower-level lib instead. No API update.
  3. Pass a new flag to GMT_Open_VirtualFile that means "when we call GMT_Close_VIrtualFile, zap the object". Because I foolishly did not give GMT_Close_VirtualFile a mode argument there is nothing to pass there, so it would have to be via a falg in the object (S_obj->zip_it = true or similar). This does change the API by adding one more enum (GMT_IS_TEMPORARY or similar) but not any function.

We would have to backport to 6.1 for the above command to work in 6.1.1. Maybe option 2 is good enough for now? No point making a full API function that other developers will never use, no?

@seisman
Copy link
Member Author

seisman commented Jul 29, 2020

Option 2 sounds good to me.

@joa-quim
Copy link
Member

Yes, 2 seems the best for now.

@PaulWessel
Copy link
Member

OK separate PR coming up, with backport, that is independent of the put-strings branch.

@PaulWessel
Copy link
Member

Now that the side-show of #3776 was taken care of, it sounds like this PR works well for Julia, PyGMT as well as GMT itself. Maybe can be approced, @seisman ?

@seisman
Copy link
Member Author

seisman commented Jul 30, 2020

Yes, looks good to me. Since I opened this PR, I can't approve myself.

@seisman seisman merged commit 2e87fbf into master Jul 30, 2020
@seisman seisman deleted the put-strings branch July 30, 2020 02:24
github-actions bot pushed a commit that referenced this pull request Jul 30, 2020
…ings (#3718)

* Debug GMT_Put_Strings

* SHift GMT_IS_DUPLICATE,REFERENCE and allow for string duplication

GMT_Put_Strings only worked by reference.  Now, we can accept a family argument such as GMT_IS_VECTOR|GMT_IS_DUPLICATE (or GMT_IS_REFERENCE which is the default) and it will duplicate the array string if requested.

* Adjust these enums to allow future use in GMT_Put_Vector|Matrix

* Update gmt_enum_dict.h

* Possible clash with GMT_VIA_MODULE_INPUT

* Update gmt_enum_dict.h

* Update gmt_enum_dict.h and testapi_vector_strings2.c

* Update api.rst

* Update gmt_api.c

* Update gmt_api.c

* Add debug messages

* check for directory

* Update gmt_io.c

Co-authored-by: Paul Wessel <[email protected]>
seisman added a commit that referenced this pull request Jul 30, 2020
…ings (#3718) (#3777)

* Debug GMT_Put_Strings

* SHift GMT_IS_DUPLICATE,REFERENCE and allow for string duplication

GMT_Put_Strings only worked by reference.  Now, we can accept a family argument such as GMT_IS_VECTOR|GMT_IS_DUPLICATE (or GMT_IS_REFERENCE which is the default) and it will duplicate the array string if requested.

* Adjust these enums to allow future use in GMT_Put_Vector|Matrix

* Update gmt_enum_dict.h

* Possible clash with GMT_VIA_MODULE_INPUT

* Update gmt_enum_dict.h

* Update gmt_enum_dict.h and testapi_vector_strings2.c

* Update api.rst

* Update gmt_api.c

* Update gmt_api.c

* Add debug messages

* check for directory

* Update gmt_io.c

Co-authored-by: Paul Wessel <[email protected]>

Co-authored-by: Dongdong Tian <[email protected]>
Co-authored-by: Paul Wessel <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 6.1 Backport this PR to 6.1 branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants