-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-7746] Fix type errors and enable checks for apache_beam.dataframe.* #11632
Conversation
R: @chadrik |
from typing import Any | ||
from typing import Dict | ||
from typing import Tuple | ||
from typing import Union |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scoping these imports is not a bad idea since it keeps the module namespace cleaner, but there are a couple of issues with this:
- once we get to python 3.x, and move from type comments to annotations, this will fail, as we'll be referencing non-existent objects in our annotations.
- it's not consistent with how we've done this throughout the rest of the code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should mention that a solution to the first issue is that we can refer to these types as strings, such as 'Any'
, but that's certainly a lot more awkward, and developers are likely to forget to do so and get confused/frustrated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually prefer unconditionally importing them, but was just trying to avoid lint issues (and did see this pattern elsewhere). Changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So lint complains about unguarded imports, so I put them back. We'll just to a massive sweep to fix these when we change to use type annotations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the lint error? Is it because of the unused typing
import?
I'm confused because unguarded typing imports are used all over the beam codebase without any lint errors. Check pipeline
, pipeline_context
, pipeline_options
, for starters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This note hasn't been addressed:
I had a look at the lint errors, and they are legitimate, but scoping the imports is not the right solution.
see above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yes. Thanks for catching this. Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it still needs to be fixed for dataframe.convert
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[sigh] It still didn't like PCollection. apache_beam/dataframe/transforms.py:30:0: W0611: Unused PCollection imported from apache_beam.pvalue (unused-import)
. But the rest are OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And pandas needs to be guarded. Hopefully that should be it. PTAL.
@@ -248,7 +261,12 @@ def _dict_union(dicts): | |||
return result | |||
|
|||
|
|||
def _flatten(valueish, root=()): | |||
def _flatten( | |||
valueish, # type: Union[T, Tuple[T, ...], Dict[Any, T]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to cover the List[T]
case for valueish
?
Alternately, do we wan to make this more generic:
def _flatten(
valueish, # type: Union[T, Iterable[T], Mapping[Any, T]]
root=(), # type: Tuple[Any, ...]
):
# type: (...) -> Dict[Tuple[Any, ...], T]
"""Given a nested structure of dicts, tuples, and lists, return a flat
dictionary where the values are the leafs and the keys are the "paths" to
these leaves.
For example `{a: x, b: (y, z)}` becomes `{(a,): x, (b, 0): y, (b, 1): c}`.
"""
if isinstance(valueish, typing.Mapping):
return _dict_union(_flatten(v, root + (k, )) for k, v in valueish.items())
elif isinstance(valueish, typing.Iterable):
return _dict_union(
_flatten(v, root + (ix, )) for ix, v in enumerate(valueish))
else:
return {root: valueish}
Another thought, is it valid / worthwhile to create a relationship between valueish
, root
, and the result key?:
def _flatten(
valueish, # type: Union[T, Iterable[T], Mapping[U, T]]
root=(), # type: Tuple[U, ...]
):
# type: (...) -> Dict[Tuple[U, ...], T]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should mention that typing provides a lot of opportunity to go down rabbit-holes, so the right answer is often "it would be more accurate, but it's not that valuable". A common motivator behind investing in accurately typing utility functions is when it allows you to avoid adding manual / explicit types or casts elsewhere in the code. Imagine a scenario where mypy knows key of the valueish
map, but after it passes through _flatten
, you need to re-type the result because it becomes Any
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally I'd agree that Iterable
/Mapping
would be preferable, but here I want to be restrictive about the kinds of values I decompose.
The keys types of the mapping would be Union[None, int, U]
, so you'd have to cast anyway, so I think it's simpler to leave as is.
Still, good food for thought.
PTAL |
Ping. |
Run PythonLint PreCommit |
Oh, yes. Fixed all uses of typing.TYPE_CHECKING now.
…On Tue, May 26, 2020 at 2:32 PM Chad Dombrova ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In sdks/python/apache_beam/dataframe/convert.py
<#11632 (comment)>:
> import inspect
from apache_beam import pvalue
from apache_beam.dataframe import expressions
from apache_beam.dataframe import frame_base
from apache_beam.dataframe import transforms
+if typing.TYPE_CHECKING:
+ # pylint: disable=ungrouped-imports
+ from typing import Any
+ from typing import Dict
+ from typing import Tuple
+ from typing import Union
I think it still needs to be fixed for dataframe.convert.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#11632 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADWVALSZ7GEBIO4QLANAOTRTQYPRANCNFSM4M3SWLUA>
.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to import TYPE_CHECKING
import pandas as pd | ||
|
||
import apache_beam as beam | ||
from apache_beam import transforms | ||
from apache_beam.dataframe import expressions | ||
from apache_beam.dataframe import frames # pylint: disable=unused-import | ||
|
||
if typing.TYPE_CHECKING: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The prevailing style for TYPE_CHECKING
is to import it as from typing import TYPE_CHECKING
. I think we should stay consistent. If we want to change that, it's fine by me, but we can do that in another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for consistency. Changed.
LGTM! |
Thanks! |
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username
).[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.