Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-9692]: Make CombineValues portable #11335

Merged
merged 2 commits into from
Apr 20, 2020
Merged

Conversation

rohdesamuel
Copy link
Contributor

@rohdesamuel rohdesamuel commented Apr 7, 2020

Change-Id: If1835e4d14320149b3e18eea9c97a26d3d34ee48

First PR make the DataflowRunner use a portable pipeline representation.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- Build Status --- --- Build Status
Java Build Status Build Status Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status Build Status
Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
--- Build Status
Build Status
Build Status
Build Status
Build Status
--- --- Build Status
XLang --- --- --- Build Status --- --- Build Status

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website
Non-portable Build Status Build Status
Build Status
Build Status Build Status
Portable --- Build Status --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

@pabloem
Copy link
Member

pabloem commented Apr 10, 2020

retest this please

Change-Id: If1835e4d14320149b3e18eea9c97a26d3d34ee48
@rohdesamuel rohdesamuel marked this pull request as ready for review April 13, 2020 18:09
@rohdesamuel
Copy link
Contributor Author

R: @robertwb

# Thesse overrides should be applied before the proto representation of the
# graph is created.
_PTRANSFORM_OVERRIDES = [
CombineValuesPTransformOverride()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems this one should happen after too...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This override should place the pipeline object into the same state as if the runner had defined an apply_CombineValues, what am I missing? Looking at the code, is it because other overrides might also use a CombineValues transform so it might needed to be replaced again?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this is fine; it's just preserving an inconsistency in Dataflow vs. everything else.

from apache_beam import PTransform
from apache_beam.pvalue import PCollection

# The DataflowRunner still needs access to the CombineValues members to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be preferable to simply let try and find methods for composites as well, rather than using PTransformOverrides. This would likely help with the GBK one too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't understand what you mean. Can you elaborate? What does "let try and find methods for composites as well" mean? What would an alternative to PTransformOverrides be?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that run_xxx could also be called for composites. That might, however, be a bigger change, so we can go with this approach.

Copy link
Contributor

@robertwb robertwb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can go with this modulo making the test less brittle.

from apache_beam import PTransform
from apache_beam.pvalue import PCollection

# The DataflowRunner still needs access to the CombineValues members to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that run_xxx could also be called for composites. That might, however, be a bigger change, so we can go with this approach.

# Thesse overrides should be applied before the proto representation of the
# graph is created.
_PTRANSFORM_OVERRIDES = [
CombineValuesPTransformOverride()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this is fine; it's just preserving an inconsistency in Dataflow vs. everything else.

Change-Id: Iae82f36b6044e4ad0523b8b3995ce9a713ae0541
@robertwb robertwb merged commit 37bb50d into apache:master Apr 20, 2020
yirutang pushed a commit to yirutang/beam that referenced this pull request Jul 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants