Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-7407] Adds portable wordcount test for Python 3 with Flink runner. #8745

Merged
merged 5 commits into from
Jun 12, 2019

Conversation

tvalentyn
Copy link
Contributor

@tvalentyn tvalentyn commented Jun 3, 2019

This PR extends the portable python precommit test suite with a Python 3 scenario.

  • The PR adds portable wordcount test for Python 3 with Flink running in-process runner and SDK harness running in Docker container.
  • To run Python 2 and Python 3 test scenarios in parallel, we add the Py3 scenario to a new project project sdks/python/test-suites/portable/py35/build.gradle inline with the proposal discussed in https://lists.apache.org/thread.html/7c48200678c570c6e42d4933f8ea4f9324d82f7ae720b591a93ffcda@%3Cdev.beam.apache.org%3E.
  • To reduce the duplication of Gradle code between Python 2 and Python 3 suites, we move the common the definition of Portable Wordcount task to to Beam Module Plugin.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- Build Status --- --- ---
Java Build Status Build Status Build Status Build Status
Build Status
Build Status
Build Status Build Status Build Status
Python Build Status
Build Status
--- Build Status
Build Status
Build Status --- --- ---

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website
Non-portable Build Status Build Status Build Status Build Status
Portable --- Build Status --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

@tvalentyn tvalentyn force-pushed the flink_wordcount branch 2 times, most recently from a58d1e9 to b0a58b8 Compare June 3, 2019 20:20
@tvalentyn
Copy link
Contributor Author

R: @markflyhigh @angoenka
cc: @fredo838, @Juta

@tvalentyn
Copy link
Contributor Author

Ping please, @markflyhigh @angoenka. CC: @aaltay .

Copy link
Contributor

@angoenka angoenka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tvalentyn !
Some minor comments.

'has Python %d.%d interpreter. See also: BEAM-7474.' % (
sys.version_info[0], sys.version_info[1]))

return ('{user}-docker-apache.bintray.io/beam/python'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we log the image URL

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -77,11 +77,14 @@ task preCommitPy2() {
dependsOn "lint"
}

task portablePreCommit() {
addPortableWordCountTask('BatchPy2', false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can infer batch or streaming from the flag

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can, however, the portableWordCountExample definition would be more complicated. Do we meed that task at all? how is it used?
I can remove it if you prefer and remove one of the prefix parameter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can certainly remove portableWordCountExample but we would like to keep portableWordCount

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -1815,6 +1815,58 @@ class BeamModulePlugin implements Plugin<Project> {
}
}
}

project.ext.addPortableWordCountTask = { String nameSuffix, boolean isStreaming ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know, thanks. I find explicit way more reader-friendly. Unless there is a reason (long list of optional parameters, etc), I think explicit is better than implicit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have similar feeling but I think in groovy land, implicit is more popular because we don't have to change method signature to add new arguments or remove arguments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL. I couldn't find such recommendation in Groovy style guide https://groovy-lang.org/style-guide.html. I moved the streaming parameter into an internal helper function, and it only has 1 parameter right now. Making it implicit requires adding another "Configuration" class, which at this point seems unnecessary boilerplate. I agree that passing maps is better when we have large number of parameters, similar how service requests/responses are JSON / ProtoBufs, but I don't think we should follow this everywhere.

Copy link
Contributor

@markflyhigh markflyhigh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @tvalentyn! General lgtm. Some minor thoughts.


project.task('portableWordCount' + nameSuffix) {
dependsOn 'installGcpTest'
dependsOn = ['installGcpTest']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch.

else:
version_suffix = '3'
# TODO(BEAM-7474): Use an image which has correct Python minor version.
logging.warning('Make sure that locally built Python SDK docker image '
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it have to be locally built?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have a release process for this image yet, so we build it locally when we run the test.

@@ -1815,6 +1815,58 @@ class BeamModulePlugin implements Plugin<Project> {
}
}
}

project.ext.addPortableWordCountTask = { String nameSuffix, boolean isStreaming ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay to have this task in BeamModulePlugin for now. But my concern is that this may not be scalable if we want to add more variants (e.g. pipelines, configurations) in the future. We probably can make it more configurable or put it in separate groovy if more variants we are end up with.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is my concern as well, this is being discussed on the mailing list, let's continue the discussion on https://lists.apache.org/thread.html/7c48200678c570c6e42d4933f8ea4f9324d82f7ae720b591a93ffcda@%3Cdev.beam.apache.org%3E.

Copy link
Contributor Author

@tvalentyn tvalentyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, @angoenka @markflyhigh . PTAL.

else:
version_suffix = '3'
# TODO(BEAM-7474): Use an image which has correct Python minor version.
logging.warning('Make sure that locally built Python SDK docker image '
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have a release process for this image yet, so we build it locally when we run the test.

'has Python %d.%d interpreter. See also: BEAM-7474.' % (
sys.version_info[0], sys.version_info[1]))

return ('{user}-docker-apache.bintray.io/beam/python'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


project.task('portableWordCount' + nameSuffix) {
dependsOn 'installGcpTest'
dependsOn = ['installGcpTest']
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch.

@@ -1815,6 +1815,58 @@ class BeamModulePlugin implements Plugin<Project> {
}
}
}

project.ext.addPortableWordCountTask = { String nameSuffix, boolean isStreaming ->
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know, thanks. I find explicit way more reader-friendly. Unless there is a reason (long list of optional parameters, etc), I think explicit is better than implicit.

@@ -1815,6 +1815,58 @@ class BeamModulePlugin implements Plugin<Project> {
}
}
}

project.ext.addPortableWordCountTask = { String nameSuffix, boolean isStreaming ->
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is my concern as well, this is being discussed on the mailing list, let's continue the discussion on https://lists.apache.org/thread.html/7c48200678c570c6e42d4933f8ea4f9324d82f7ae720b591a93ffcda@%3Cdev.beam.apache.org%3E.

@@ -77,11 +77,14 @@ task preCommitPy2() {
dependsOn "lint"
}

task portablePreCommit() {
addPortableWordCountTask('BatchPy2', false)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can, however, the portableWordCountExample definition would be more complicated. Do we meed that task at all? how is it used?
I can remove it if you prefer and remove one of the prefix parameter.

@angoenka
Copy link
Contributor

angoenka commented Jun 7, 2019

Thanks @tvalentyn
Added my comments.

@tvalentyn
Copy link
Contributor Author

Thanks, @angoenka, PTAL.

@tvalentyn
Copy link
Contributor Author

Run Java_Examples_Dataflow PreCommit

@tvalentyn
Copy link
Contributor Author

Run Python PreCommit

@tvalentyn
Copy link
Contributor Author

Run Java PreCommit

@angoenka angoenka merged commit af7ad3f into apache:master Jun 12, 2019
@tvalentyn tvalentyn deleted the flink_wordcount branch October 23, 2019 00:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants