Skip to content

Commit

Permalink
pyarrow: Support date32[day] and date64[ms] dtypes in pandas objects (#…
Browse files Browse the repository at this point in the history
…2845)

* Convert pyarrow date32/date64 dtypes to np.datetime64

Handle date columns in pandas.DataFrame with pyarrow dtypes
like date32[day][pyarrow] or date64[ms][pyarrow] by modifying
the vectors_to_arrays conversion function. Added some parametrized
unit tests to test_info.py to ensure this works.

* Handle Python lists without dtype attr and use as_c_contiguous

Need to handle Python lists that don't have the dtype attribute, unlike
pandas.Series objects. Also ensure that we return a C-contiguous array.

* Add doctest to check that date32/date64 are converted to datetime64

Ensure that pyarrow date32 and date64 dtypes are converted to
numpy.datetime64 dtype. Added pyarrow dependency to ci_doctests.yaml.
Also changed from using `"date" in vec_dtype` to `vec_dtype.startswith("date")`.

* Refactor to use pygmt.helpers.testing.skip_if_no

* Document that PyArrow date32/date64 dtypes are now supported in PyGMT

* Refactor to use dict mapping instead of if-then

---------

Co-authored-by: Dongdong Tian <[email protected]>
  • Loading branch information
weiji14 and seisman committed Dec 17, 2023
1 parent 20054a1 commit 005de65
Show file tree
Hide file tree
Showing 4 changed files with 46 additions and 6 deletions.
1 change: 1 addition & 0 deletions .github/workflows/ci_doctests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@ jobs:
contextily
geopandas
ipython
pyarrow
rioxarray
build
make
Expand Down
4 changes: 2 additions & 2 deletions doc/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -112,8 +112,8 @@ The following are optional dependencies:
If you have `PyArrow <https://arrow.apache.org/docs/python/index.html>`__
installed, PyGMT does have some initial support for ``pandas.Series`` and
``pandas.DataFrame`` objects with Apache Arrow-backed arrays. Specifically,
only uint/int/float dtypes are supported for now. Support for datetime and
string Arrow dtypes are still working in progress. For more details, see
only uint/int/float and date32/date64 dtypes are supported for now. Support
for string Arrow dtypes is still a work in progress. For more details, see
`issue #2800 <https://github.com/GenericMappingTools/pygmt/issues/2800>`__.

Installing GMT and other dependencies
Expand Down
33 changes: 32 additions & 1 deletion pygmt/clib/conversion.py
Original file line number Diff line number Diff line change
Expand Up @@ -162,11 +162,42 @@ def vectors_to_arrays(vectors):
True
>>> all(isinstance(i, np.ndarray) for i in arrays)
True
>>> data = [[1, 2], (3, 4), range(5, 7)]
>>> all(isinstance(i, np.ndarray) for i in vectors_to_arrays(data))
True
>>> import datetime
>>> import pytest
>>> pa = pytest.importorskip("pyarrow")
>>> vectors = [
... pd.Series(
... data=[datetime.date(2020, 1, 1), datetime.date(2021, 12, 31)],
... dtype="date32[day][pyarrow]",
... ),
... pd.Series(
... data=[datetime.date(2022, 1, 1), datetime.date(2023, 12, 31)],
... dtype="date64[ms][pyarrow]",
... ),
... ]
>>> arrays = vectors_to_arrays(vectors)
>>> all(a.flags.c_contiguous for a in arrays)
True
>>> all(isinstance(a, np.ndarray) for a in arrays)
True
>>> all(isinstance(a.dtype, np.dtypes.DateTime64DType) for a in arrays)
True
"""
arrays = [as_c_contiguous(np.asarray(i)) for i in vectors]
dtypes = {
"date32[day][pyarrow]": np.datetime64,
"date64[ms][pyarrow]": np.datetime64,
}
arrays = []
for vector in vectors:
vec_dtype = str(getattr(vector, "dtype", ""))
array = np.asarray(a=vector, dtype=dtypes.get(vec_dtype, None))
arrays.append(as_c_contiguous(array))

return arrays


Expand Down
14 changes: 11 additions & 3 deletions pygmt/tests/test_info.py
Original file line number Diff line number Diff line change
Expand Up @@ -119,14 +119,22 @@ def test_info_numpy_array_time_column():
assert output == expected_output


def test_info_pandas_dataframe_time_column():
@pytest.mark.parametrize(
"dtype",
[
"datetime64[ns]",
pytest.param("date32[day][pyarrow]", marks=skip_if_no(package="pyarrow")),
pytest.param("date64[ms][pyarrow]", marks=skip_if_no(package="pyarrow")),
],
)
def test_info_pandas_dataframe_date_column(dtype):
"""
Make sure info works on pandas.DataFrame inputs with a time column.
Make sure info works on pandas.DataFrame inputs with a date column.
"""
table = pd.DataFrame(
data={
"z": [10, 13, 12, 15, 14],
"time": pd.date_range(start="2020-01-01", periods=5),
"date": pd.date_range(start="2020-01-01", periods=5).astype(dtype=dtype),
}
)
output = info(data=table)
Expand Down

0 comments on commit 005de65

Please sign in to comment.