Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

{full,zeros,ones}_like typing #6611

Merged
merged 9 commits into from
May 16, 2022
Merged

Conversation

headtr1ck
Copy link
Collaborator

(partial) typing for functions full_like, zeros_like, ones_like.

I could not figure out how to properly use TypeVars so many things are "hardcoded" with overloads.

I have added a DTypeLikeSave to npcompat, not sure that this file is supposed to be edited.

Problem1: TypeVar["T", Dataset, DataArray, Variable] can only be one of these three, but never with Union[Dataset, DataArray] which is used in several other places in xarray.
Problem2: The official mypy support says: use TypeVar["T", bound=Union[Dataset, DataArray, Variable] but the the isinstance(obj, Dataset) could not be correctly resolved (is that a mypy issue?).

So if anyone can get it to work with TypeVars, feel free to change it. :)

Copy link
Collaborator

@max-sixty max-sixty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks a lot @headtr1ck .

I couldn't seem to push to the PR, but I added some annotations to our tests so we could also test the type annotations:

commit cb4c5dd2dac49230220beda3fa72b286c8849ab4
Author: Maximilian Roos <[email protected]>
Date:   Sun May 15 14:41:46 2022 -0700

    Type check more tests

diff --git a/xarray/tests/test_dataset.py b/xarray/tests/test_dataset.py
index 263237d9..950f15e9 100644
--- a/xarray/tests/test_dataset.py
+++ b/xarray/tests/test_dataset.py
@@ -5559,7 +5559,7 @@ def test_binary_op_join_setting(self):
             actual = ds1 + ds2
             assert_equal(actual, expected)
 
-    def test_full_like(self):
+    def test_full_like(self) -> None:
         # For more thorough tests, see test_variable.py
         # Note: testing data_vars with mismatched dtypes
         ds = Dataset(
@@ -5572,8 +5572,9 @@ def test_full_like(self):
         actual = full_like(ds, 2)
 
         expected = ds.copy(deep=True)
-        expected["d1"].values = [2, 2, 2]
-        expected["d2"].values = [2.0, 2.0, 2.0]
+        # https://github.com/python/mypy/issues/3004
+        expected["d1"].values = [2, 2, 2]  # type: ignore
+        expected["d2"].values = [2.0, 2.0, 2.0]  # type: ignore
         assert expected["d1"].dtype == int
         assert expected["d2"].dtype == float
         assert_identical(expected, actual)
@@ -5581,8 +5582,8 @@ def test_full_like(self):
         # override dtype
         actual = full_like(ds, fill_value=True, dtype=bool)
         expected = ds.copy(deep=True)
-        expected["d1"].values = [True, True, True]
-        expected["d2"].values = [True, True, True]
+        expected["d1"].values = [True, True, True]  # type: ignore
+        expected["d2"].values = [True, True, True]  # type: ignore
         assert expected["d1"].dtype == bool
         assert expected["d2"].dtype == bool
         assert_identical(expected, actual)
@@ -5788,7 +5789,7 @@ def test_ipython_key_completion(self):
             ds.data_vars[item]  # should not raise
         assert sorted(actual) == sorted(expected)
 
-    def test_polyfit_output(self):
+    def test_polyfit_output(self) -> None:
         ds = create_test_data(seed=1)
 
         out = ds.polyfit("dim2", 2, full=False)
@@ -5801,7 +5802,7 @@ def test_polyfit_output(self):
         out = ds.polyfit("time", 2)
         assert len(out.data_vars) == 0
 
-    def test_polyfit_warnings(self):
+    def test_polyfit_warnings(self) -> None:
         ds = create_test_data(seed=1)
 
         with warnings.catch_warnings(record=True) as ws:
diff --git a/xarray/tests/test_variable.py b/xarray/tests/test_variable.py
index 0168f19b..886b0360 100644
--- a/xarray/tests/test_variable.py
+++ b/xarray/tests/test_variable.py
@@ -2480,7 +2480,7 @@ def test_datetime(self):
         assert np.ndarray == type(actual)
         assert np.dtype("datetime64[ns]") == actual.dtype
 
-    def test_full_like(self):
+    def test_full_like(self) -> None:
         # For more thorough tests, see test_variable.py
         orig = Variable(
             dims=("x", "y"), data=[[1.5, 2.0], [3.1, 4.3]], attrs={"foo": "bar"}
@@ -2503,7 +2503,7 @@ def test_full_like(self):
             full_like(orig, True, dtype={"x": bool})
 
     @requires_dask
-    def test_full_like_dask(self):
+    def test_full_like_dask(self) -> None:
         orig = Variable(
             dims=("x", "y"), data=[[1.5, 2.0], [3.1, 4.3]], attrs={"foo": "bar"}
         ).chunk(((1, 1), (2,)))
@@ -2534,14 +2534,14 @@ def check(actual, expect_dtype, expect_values):
             else:
                 assert not isinstance(v, np.ndarray)
 
-    def test_zeros_like(self):
+    def test_zeros_like(self) -> None:
         orig = Variable(
             dims=("x", "y"), data=[[1.5, 2.0], [3.1, 4.3]], attrs={"foo": "bar"}
         )
         assert_identical(zeros_like(orig), full_like(orig, 0))
         assert_identical(zeros_like(orig, dtype=int), full_like(orig, 0, dtype=int))
 
-    def test_ones_like(self):
+    def test_ones_like(self) -> None:
         orig = Variable(
             dims=("x", "y"), data=[[1.5, 2.0], [3.1, 4.3]], attrs={"foo": "bar"}
         )

xarray/core/common.py Show resolved Hide resolved


@overload
def zeros_like(other: Variable, dtype: DTypeLikeSave) -> Variable:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For these ones which are generic over DataArray and Variable, we could define a TypeVar for them (but no need in this PR, and I'm not even sure whether it's a broader issue than just the *_like.

Comment on lines +1807 to +1810
@overload
def zeros_like(
other: Dataset | DataArray, dtype: DTypeMaybeMapping = None
) -> Dataset | DataArray:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know why this overload is required? I had thought that DataArray couldn't take a DTypeMaybeMapping? (and same principle for the next two)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to adjust these locally. An issue I hit is that so much of our typing is on Dataset rather than T_Dataset and changing one seems to require changing its dependencies, which makes it difficult to change gradually. 20 minutes into me trying to change this, I had 31 errors in 6 files!

To the extent you know which parts are Perfect vs Not-perfect-but-required-to-pass: If you want to add a comment for the ones the latter, that will make it easier for future travelers to know why things are as they are and hopefully change them.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was struggling a lot with TypeVar definitions.
Some other functions (like polyval) are calling zeros_like with DataArray | Dataset.

This means that a "simple" TypeVar["T", DataArray, Dataset] will not work.

I tried using TypeVar["T", bound=DataArray|Dataset] (recommended by some mypy people) but then the if isinstance(x Dataset) were causing problems (still not sure if that is a mypy bug or intended, my TypeVar knowledge is not good enough for that)...

So this was the only solution I could get to work.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, the code will actually create a plain e.g. DataArray, so typevars with bounds are actually wrong here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I think the proposed code is a big upgrade, and we can refine towards perfection in the future...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I found this SO answer helpful in clarifying the difference — i saw that I had upvoted it before — but I'm still not confident in how we should design these methods.

@@ -1917,7 +1917,7 @@ def polyval(
return res


def _ensure_numeric(data: T_Xarray) -> T_Xarray:
def _ensure_numeric(data: Dataset | DataArray) -> Dataset | DataArray:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the kind of func that would be nice at some point to make generic; with the proposed code we lose whether it's a Dataset vs. DataArray. (fine to add as a comment / TODO tho)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I failed to make it work with Typevars since it is called with DataArray|Dataset :(

xarray/core/npcompat.py Show resolved Hide resolved
@max-sixty
Copy link
Collaborator

Thanks a lot @headtr1ck ! I will hit the auto-merge button!

@max-sixty max-sixty enabled auto-merge (squash) May 16, 2022 17:31
@max-sixty max-sixty merged commit e712270 into pydata:main May 16, 2022
@max-sixty
Copy link
Collaborator

One thought for any future travelers — if we want to move from Union[DataArray, Dataset] to generic over DataArray & Dataset without changing everything at once (as was the issue here) we can try adding the generic type that methods accept without removing existing types, e.g. Union[DataArray, Dataset, T_Xarray].

@headtr1ck headtr1ck deleted the full_like_typing branch May 16, 2022 18:10
dcherian added a commit to dcherian/xarray that referenced this pull request May 20, 2022
* main:
  concatenate docs style (pydata#6621)
  Typing for open_dataset/array/mfdataset and to_netcdf/zarr (pydata#6612)
  {full,zeros,ones}_like typing (pydata#6611)
dcherian added a commit to headtr1ck/xarray that referenced this pull request May 20, 2022
commit 398f1b6
Author: dcherian <[email protected]>
Date:   Fri May 20 08:47:56 2022 -0600

    Backward compatibility dask

commit bde40e4
Merge: 0783df3 4cae8d0
Author: dcherian <[email protected]>
Date:   Fri May 20 07:54:48 2022 -0600

    Merge branch 'main' into dask-datetime-to-numeric

    * main:
      concatenate docs style (pydata#6621)
      Typing for open_dataset/array/mfdataset and to_netcdf/zarr (pydata#6612)
      {full,zeros,ones}_like typing (pydata#6611)

commit 0783df3
Merge: 5cff4f1 8de7061
Author: dcherian <[email protected]>
Date:   Sun May 15 21:03:50 2022 -0600

    Merge branch 'main' into dask-datetime-to-numeric

    * main: (24 commits)
      Fix overflow issue in decode_cf_datetime for dtypes <= np.uint32 (pydata#6598)
      Enable flox in GroupBy and resample (pydata#5734)
      Add setuptools as dependency in ASV benchmark CI (pydata#6609)
      change polyval dim ordering (pydata#6601)
      re-add timedelta support for polyval (pydata#6599)
      Minor Dataset.map docstr clarification (pydata#6595)
      New inline_array kwarg for open_dataset (pydata#6566)
      Fix polyval overloads (pydata#6593)
      Restore old MultiIndex dropping behaviour (pydata#6592)
      [docs] add Dataset.assign_coords example (pydata#6336) (pydata#6558)
      Fix zarr append dtype checks (pydata#6476)
      Add missing space in exception message (pydata#6590)
      Doc Link to accessors list in extending-xarray.rst (pydata#6587)
      Fix Dataset/DataArray.isel with drop=True and scalar DataArray indexes (pydata#6579)
      Add some warnings about rechunking to the docs (pydata#6569)
      [pre-commit.ci] pre-commit autoupdate (pydata#6584)
      terminology.rst: fix link to Unidata's "netcdf_dataset_components" (pydata#6583)
      Allow string formatting of scalar DataArrays (pydata#5981)
      Fix mypy issues & reenable in tests (pydata#6581)
      polyval: Use Horner's algorithm + support chunked inputs (pydata#6548)
      ...

commit 5cff4f1
Merge: dfe200d 6144c61
Author: Maximilian Roos <[email protected]>
Date:   Sun May 1 15:16:33 2022 -0700

    Merge branch 'main' into dask-datetime-to-numeric

commit dfe200d
Author: dcherian <[email protected]>
Date:   Sun May 1 11:04:03 2022 -0600

    Minor cleanup

commit 35ed378
Author: dcherian <[email protected]>
Date:   Sun May 1 10:57:36 2022 -0600

    Support dask arrays in datetime_to_numeric
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants