Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clib: Add virtualfile_to_dataset method for converting virtualfile to a dataset #3083

Merged
merged 13 commits into from
Mar 11, 2024

Conversation

seisman
Copy link
Member

@seisman seisman commented Mar 1, 2024

Add the new Session.virtualfile_to_dataset() method to simplify the logic of dealing with table outputs in different wrappers. Draft PR #3092 shows how the new Session.virtualfile_to_dataset() method can simplify other wrappers.

@@ -1738,6 +1738,119 @@ def read_virtualfile(
dtype = {"dataset": _GMT_DATASET, "grid": _GMT_GRID}[kind]
return ctp.cast(pointer, ctp.POINTER(dtype))

def return_table(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method name return_table was initially proposed in #1318 (comment), but is it a good name? At line 1845, we use kind="dataset", so maybe rename it to return_dataset or output_dataset? In the future, I think we will add more methods that return a grid/image/cpt/cube, so consistent method names are preferred.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think return_dataset is better than return_table. Renamed in ce029b2.

Copy link
Member

@weiji14 weiji14 Mar 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about virtualfile_to_dataset, or vfile_to_dataset (shorter)? A bit more explicit to say that a conversion is happening from a virtualfile to a tabular dataset.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

virtualfile_to_dataset/vfile_to_dataset sounds a good name. I like vfile_to_dataset, but do we want to also rename other functions for consistency?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean rename the virtualfile_from_* methods? Let's maybe not do that (lazy to deprecate more names). We can go with virtualfile_to_dataset for consistency.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant to rename the virtualfile_in/virtualfile_out to vfile_in/vfile_out.

Copy link
Member

@weiji14 weiji14 Mar 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep with the long name (virtualfile_to_dataset). Renaming virtualfile_out -> vfile_out and virtualfile_in -> vfile_in will be more work and confusing (we'll need to manually update the changelog to track that virtualfile_from_data became virtualfile_in which became vfile_in).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 711142c.

@seisman seisman marked this pull request as ready for review March 7, 2024 06:18
@seisman seisman added enhancement Improving an existing feature needs review This PR has higher priority and needs review. labels Mar 7, 2024
@seisman seisman added this to the 0.12.0 milestone Mar 7, 2024
@seisman seisman changed the title clib: Add the return_table method for table outputs clib: Add the return_datasert method for table outputs Mar 8, 2024
@seisman seisman changed the title clib: Add the return_datasert method for table outputs clib: Add the return_dataset method for table outputs Mar 8, 2024
@seisman seisman requested a review from a team March 8, 2024 14:13
@@ -1738,6 +1738,123 @@ def read_virtualfile(
dtype = {"dataset": _GMT_DATASET, "grid": _GMT_GRID}[kind]
return ctp.cast(pointer, ctp.POINTER(dtype))

def return_dataset(
self,
output_type: Literal["pandas", "numpy", "file"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we set a default output type here? It looks like we're using pandas as the default in #3092.

Suggested change
output_type: Literal["pandas", "numpy", "file"],
output_type: Literal["pandas", "numpy", "file"] = "pandas",

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes no differences because we always call the function with the output_type parameter, e.g.,:

        return lib.return_dataset(
            output_type=output_type,
            vfile=vouttbl,
        )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it doesn't make any difference in the PyGMT modules, but this is a good central location to document that output_type="pandas" is the default output (though in #1318, it seemed like most of us were in favour of output_type="input" or auto as the default).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

output_type="input" or auto may not make sense for PyGMT, especially in cases like:

  1. the input data is a file, then auto means outputting to a file by default, then outfile is required.
  2. the input data is vectors (e.g., x/y/z) and each vector can be a list/ndarray/pd.Series. Then what's the expected format if auto/input is used?

Copy link
Member

@weiji14 weiji14 Mar 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, not saying that output_type="auto" would be easy to implement 🙂 I think the default output_type="pandas" is fine for now since it is an in-memory format that can be converted to virtualfiles relatively easily. We can discuss more about what the ideal output type would be in #1318 (if there is still any debate that needs to be had).

pygmt/clib/session.py Outdated Show resolved Hide resolved
@@ -1738,6 +1738,119 @@ def read_virtualfile(
dtype = {"dataset": _GMT_DATASET, "grid": _GMT_GRID}[kind]
return ctp.cast(pointer, ctp.POINTER(dtype))

def return_table(
Copy link
Member

@weiji14 weiji14 Mar 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about virtualfile_to_dataset, or vfile_to_dataset (shorter)? A bit more explicit to say that a conversion is happening from a virtualfile to a tabular dataset.

@seisman seisman changed the title clib: Add the return_dataset method for table outputs clib: Add the virtualfile_to_dataset method for converting a virtualfile to a dataset Mar 11, 2024
@@ -294,6 +294,7 @@ conversion of Python variables to GMT virtual files:
clib.Session.virtualfile_from_grid
clib.Session.virtualfile_in
clib.Session.virtualfile_out
clib.Session.virtualfile_to_dataset
Copy link
Member

@weiji14 weiji14 Mar 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At L287 above, could you change the sentence to read "These methods are context managers that automate the conversion of Python variables to and from GMT virtual files"? Since we can convert GMT virtualfiles to Python objects now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in aee0499.

@weiji14 weiji14 changed the title clib: Add the virtualfile_to_dataset method for converting a virtualfile to a dataset clib: Add virtualfile_to_dataset method for converting virtualfile to a dataset Mar 11, 2024
@seisman
Copy link
Member Author

seisman commented Mar 11, 2024

In commit cd5b31d, I've changed the parameter name vfile to vfname to make it consistent with the virtualfile_out and read_virtualfile methods.

Copy link
Member

@weiji14 weiji14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one more small suggestion. Ok to leave out setting output_type="pandas" as the default setting (https://github.com/GenericMappingTools/pygmt/pull/3083/files#r1518966573) for now.

doc/api/index.rst Outdated Show resolved Hide resolved
@seisman seisman added final review call This PR requires final review and approval from a second reviewer and removed needs review This PR has higher priority and needs review. labels Mar 11, 2024
@seisman
Copy link
Member Author

seisman commented Mar 11, 2024

Thanks for all the feedbacks. Now marking the PR as "final review call" and will merge it after 24 hours.

@seisman seisman merged commit 0b46aad into main Mar 11, 2024
19 of 21 checks passed
@seisman seisman deleted the clib/return_table branch March 11, 2024 11:17
@seisman seisman removed the final review call This PR requires final review and approval from a second reviewer label Mar 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improving an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants