Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow np.object dtypes into virtualfile_from_vectors #684

Merged
merged 1 commit into from
Nov 7, 2020

Conversation

weiji14
Copy link
Member

@weiji14 weiji14 commented Nov 7, 2020

Description of proposed changes

Loosen the check in virtualfile_from_vectors to allow for any string-like dtype (np.str, np.object) by performing the check using pd.api.types.is_string_dtype(). The array is then converted (if needed) to a proper np.str dtype before giving it to put_strings.

Why is this needed ?

This is one step in enabling text input into modules like:

Those modules rely on pandas.DataFrame inputs, but a 'str' column in pandas is typically stored as an 'object' dtype (see https://stackoverflow.com/questions/21018654/strings-in-a-dataframe-but-dtype-is-object), unless users take due care to store them in the new pandas.StringDtype. Either way, when we convert these pandas.Series objects to a numpy array, their dtype becomes np.object rather than np.str (hence why our code needs to handle np.object too).

After this PR is merged, we can do something like:

            if kind == "matrix":
                if pd.api.types.is_numeric_dtype(data):
                    # all numeric dtypes (e.g. int, float)
                    file_context = lib.virtualfile_from_matrix(data)
                else:
                    # contains a column with object (str) dtype
                    file_context = lib.virtualfile_from_vectors(
                        *[data[column] for column in data]
                    )

Fixes #

Reminders

  • Run make format and make check to make sure the code follows the style guide.
  • Add tests for new features or tests that would have caught the bug that you're fixing.
  • Add new public functions/methods/classes to doc/api/index.rst.
  • Write detailed docstrings for all functions/methods.
  • If adding new functionality, add an example to docstrings or tutorials.

Notes

  • You can write /format in the first line of a comment to lint the code automatically

Loosen the check in `virtualfile_from_vectors` to allow for any string-like dtype (np.str, np.object) by performing the check using `pd.api.types.is_string_dtype()`. The array is then converted (if needed) to a proper np.str dtype before giving it to put_strings.
@weiji14 weiji14 added the enhancement Improving an existing feature label Nov 7, 2020
@weiji14 weiji14 mentioned this pull request Nov 7, 2020
@seisman seisman added this to In progress in Release v0.2.x via automation Nov 7, 2020
@seisman seisman added this to the 0.2.1 milestone Nov 7, 2020
Copy link
Member

@seisman seisman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! I believe it solves a long-term headache when I use pandas with PyGMT.

@weiji14 weiji14 merged commit 1b01f79 into master Nov 7, 2020
Release v0.2.x automation moved this from In progress to Done Nov 7, 2020
@weiji14 weiji14 deleted the put_strings_object_dtype branch November 7, 2020 20:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improving an existing feature
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

2 participants