Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow datetime inputs to region argument #562

Merged
merged 2 commits into from
Aug 16, 2020
Merged

Allow datetime inputs to region argument #562

merged 2 commits into from
Aug 16, 2020

Conversation

weiji14
Copy link
Member

@weiji14 weiji14 commented Aug 14, 2020

Description of proposed changes

Allow users to pass in datetime-like inputs to the 'region' argument in modules like plot, basemap, and so on. Helpful for people who usually just want to use something like region = [data.min(), data.max(), ...].

Includes support for:

What is not supported:

  • Raw datetime strings in non-ISO format, e.g., 1/1/2018, Jul 5, 2019

Fixes #561

Reminders

  • Run make format and make check to make sure the code follows the style guide.
  • Add tests for new features or tests that would have caught the bug that you're fixing.
  • Add new public functions/methods/classes to doc/api/index.rst.
  • Write detailed docstrings for all functions/methods.
  • If adding new functionality, add an example to docstrings or tutorials.

Includes:
- Python datetime objects
- np.datetime64
- pandas.Timestamp
- xarray.DataArray with datetime64 dtype
@weiji14 weiji14 added the enhancement Improving an existing feature label Aug 14, 2020
@weiji14 weiji14 marked this pull request as ready for review August 14, 2020 03:42
@weiji14 weiji14 changed the title Support various datetime inputs to region argument Allow datetime inputs to region argument Aug 14, 2020
except AssertionError:
# convert datetime-like items to string format
value[index] = np.datetime_as_string(
np.asarray(item, dtype=np.datetime64)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use np.datetime_as_string(array_to_datetime(vector)) to convert datetime arrays to string arrays.

pygmt/pygmt/clib/session.py

Lines 778 to 782 in 2ddbb0e

if gmt_type == self["GMT_DATETIME"]:
vector_pointer = (ctp.c_char_p * len(vector))()
vector_pointer[:] = np.char.encode(
np.datetime_as_string(array_to_datetime(vector))
)

The array_to_datetime function uses pd.to_datetime(array)
return pd.to_datetime(array)

Any specific reason to use np.asarray here?

Copy link
Member Author

@weiji14 weiji14 Aug 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main reason is to handle xarray.DataArray inputs. The array_to_datetime/pd.to_datetime function is meant to handle arrays/lists, but not scalar values. I.e. this doesn't work:

import pandas as pd
import xarray as xr

pd.to_datetime(xr.DataArray(data=np.datetime64("2005-01-01")))
# TypeError: len() of unsized object

but this does:

import numpy as np
import xarray as xr

np.asarray(xr.DataArray(data=np.datetime64("2005-01-01")))
# array('2005-01-01T00:00:00.000000000', dtype='datetime64[ns]')

We actually came across this before in #464 (comment).

)
for index, item in enumerate(value):
try:
assert " " not in str(item)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused how the assert statement works. Could you please explain a little bit?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry that this isn't very readable, it's following the EAFP style, usually used when we're handling only a few exceptional cases. The original reason that we can't pass in 'datetime'-like values to the 'region' argument is because there's a space " " when converting a pandas.Timestamp to a string (instead of a "T" that would make it ISO compliant).

import pandas as pd

str(pd.Timestamp("2015-01-01T12:00:00.123456789"))
# '2015-01-01 12:00:00.123456789'

This also applies to xarray.DataArray datetime values. So the assert here checks for the space " ", and if it finds it, it will try to convert the item into an ISO compliant datetime value (e.g. "2015-01-01T12:00:00.123456789").

kwargs[arg] = separators[fmt].join(
"{}".format(item) for item in value
)
for index, item in enumerate(value):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I understand your codes correctly. It seems the codes check all arguments. For example, perspective=[30, 50] is converted to -p30/50, and the codes also check if 30 and 50 are datetime-like, right?

However, -p doesn't accept any datetime-like values. It looks a waste of time. Perhaps we should check whether arg="B" (maybe arg="region")?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I understand your codes correctly. It seems the codes check all arguments. For example, perspective=[30, 50] is converted to -p30/50, and the codes also check if 30 and 50 are datetime-like, right?

Yes, we do check every item using assert " " not in str(item), but this is quite a fast string parsing check (see also the other comment on EAFP syntax). It is not explictily checking for datetime-like (which would be a slower check) using something like isinstance(item, (pd.Timestamp, datetime.datetime, etc)).

However, -p doesn't accept any datetime-like values. It looks a waste of time. Perhaps we should check whether arg="B" (maybe arg="region")?

This would be an extra check on top of the (rather fast) assert " " not in str(item), and yes, we could add it I suppose. But really, there's usually not very many items to parse here, only 4 if using [xmin, xmax, ymin, ymax], maybe 6 if using [xmin, xmax, ymin, ymax, zmin, zmax].

Happy to consider an alternative way though if you have any ideas, I definitely think there's a better way to do this. The unit tests are all there, so feel free to refactor/change the code as long as it passes the tests!

"{}".format(item) for item in value
)
for index, item in enumerate(value):
try:
Copy link
Member

@seisman seisman Aug 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because there's a space " " when converting a pandas.Timestamp to a string

Please add it near line 360 to make the assert code easier to understand.

Copy link
Member

@seisman seisman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improving an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support datetime types in region argument
2 participants