-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-9692]: Make CombineValues portable #11335
Conversation
retest this please |
Change-Id: If1835e4d14320149b3e18eea9c97a26d3d34ee48
R: @robertwb |
# Thesse overrides should be applied before the proto representation of the | ||
# graph is created. | ||
_PTRANSFORM_OVERRIDES = [ | ||
CombineValuesPTransformOverride() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems this one should happen after too...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This override should place the pipeline object into the same state as if the runner had defined an apply_CombineValues, what am I missing? Looking at the code, is it because other overrides might also use a CombineValues transform so it might needed to be replaced again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose this is fine; it's just preserving an inconsistency in Dataflow vs. everything else.
sdks/python/apache_beam/runners/dataflow/dataflow_runner_test.py
Outdated
Show resolved
Hide resolved
from apache_beam import PTransform | ||
from apache_beam.pvalue import PCollection | ||
|
||
# The DataflowRunner still needs access to the CombineValues members to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be preferable to simply let try and find methods for composites as well, rather than using PTransformOverrides. This would likely help with the GBK one too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I don't understand what you mean. Can you elaborate? What does "let try and find methods for composites as well" mean? What would an alternative to PTransformOverrides be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that run_xxx could also be called for composites. That might, however, be a bigger change, so we can go with this approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can go with this modulo making the test less brittle.
from apache_beam import PTransform | ||
from apache_beam.pvalue import PCollection | ||
|
||
# The DataflowRunner still needs access to the CombineValues members to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that run_xxx could also be called for composites. That might, however, be a bigger change, so we can go with this approach.
# Thesse overrides should be applied before the proto representation of the | ||
# graph is created. | ||
_PTRANSFORM_OVERRIDES = [ | ||
CombineValuesPTransformOverride() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose this is fine; it's just preserving an inconsistency in Dataflow vs. everything else.
Change-Id: Iae82f36b6044e4ad0523b8b3995ce9a713ae0541
Change-Id: If1835e4d14320149b3e18eea9c97a26d3d34ee48
First PR make the DataflowRunner use a portable pipeline representation.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username
).[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.