Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-9764] multi threading & optional pulling #11428

Merged
merged 3 commits into from
Apr 21, 2020

Conversation

Hannah-Jiang
Copy link
Contributor

Please add a meaningful description for your change here


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- Build Status --- --- Build Status
Java Build Status Build Status Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status Build Status
Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
--- Build Status
Build Status
Build Status
Build Status
Build Status
--- --- Build Status
XLang --- --- --- Build Status --- --- Build Status

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website
Non-portable Build Status Build Status
Build Status
Build Status Build Status
Portable --- Build Status --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

@Hannah-Jiang
Copy link
Contributor Author

Run Python2_PVR_Flink PreCommit

@Hannah-Jiang Hannah-Jiang changed the title fix failures [BEAM-9764] fix failures Apr 15, 2020
@Hannah-Jiang Hannah-Jiang changed the title [BEAM-9764] fix failures [BEAM-9764] fix Java license failures with Python2_PVR_Flink PreCommit Apr 15, 2020
@Hannah-Jiang
Copy link
Contributor Author

Run Python2_PVR_Flink PreCommit

@Hannah-Jiang Hannah-Jiang changed the title [BEAM-9764] fix Java license failures with Python2_PVR_Flink PreCommit [BEAM-9764] multi threading & check urls instead of pulling Apr 17, 2020
@Hannah-Jiang
Copy link
Contributor Author

R: @udim
Can you do a review for this PR?
execute() function is already reviewed and it is moved to a function from a for loop approach.

@udim
Copy link
Member

udim commented Apr 17, 2020

Can you do a review for this PR?

yes

cp -r java_third_party_licenses/*.jar sdks/java/container/third_party_licenses/
cp -r java_third_party_licenses/*.csv sdks/java/container/third_party_licenses/
else
# create an empty file to aviod no file/dir existing error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# create an empty file to aviod no file/dir existing error
# create an empty file to avoid no file/dir existing error

@@ -25,15 +25,23 @@ pip install PyYAML==5.3
pip install tenacity==5.0.4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I wanted to mention that doing a pip install/uninstall like this messes with the user's environment.
The standard way to run this would be via tox (see sdks/python/tox.ini), which will take care of creating a virtualenv to run this under (and you can specify a certain python version if you wish).

Copy link
Contributor Author

@Hannah-Jiang Hannah-Jiang Apr 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed to use tox here, but it is a little complicated. We want to keep it simple within a script, so it's easy to use and maintain. Current scripts run at local machine, (not within a docker), so docker will still have clear environment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't agree that it's acceptable to modify the user environment. Also, on Jenkins this might cause failure if this script runs concurrently with itself.
I've opened a bug for this: https://issues.apache.org/jira/browse/BEAM-9797

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Udi for bringing up this issue. Yes, we should fix it. I will merge this PR and work on a different PR to fix it.

no_license_type.add(name_version)
license_type = ''
continue
csv_list = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lists need to have synchronized access if they are to be modified by different threads.
https://stackoverflow.com/questions/2227169/are-python-built-in-containers-thread-safe

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to this (post)[https://stackoverflow.com/questions/6319207/are-lists-thread-safe], append operation is thread safe. But I added lock to make sure all records are added and the operation is not expensive.

@Hannah-Jiang Hannah-Jiang changed the title [BEAM-9764] multi threading & check urls instead of pulling [BEAM-9764] multi threading & optional pulling Apr 20, 2020
@Hannah-Jiang
Copy link
Contributor Author

Thanks Udi for reviewing it. I added some replies to your comments and also fixed some of them. Could you please take a look?

@Hannah-Jiang
Copy link
Contributor Author

Run Java PreCommit

1 similar comment
@Hannah-Jiang
Copy link
Contributor Author

Run Java PreCommit

@Hannah-Jiang Hannah-Jiang merged commit 4a7f04c into apache:master Apr 21, 2020
Hannah-Jiang added a commit to Hannah-Jiang/beam that referenced this pull request Apr 28, 2020
* multi threading & check urls instead of pulling

* add thread_lock and more logs

* fix package version and import
yirutang pushed a commit to yirutang/beam that referenced this pull request Jul 23, 2020
* multi threading & check urls instead of pulling

* add thread_lock and more logs

* fix package version and import
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants