Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

For mxnet-validation pipeline, require sanity build to complete successfully before running other build pipelines. #17999

Merged
merged 2 commits into from
Apr 16, 2020

Conversation

josephevans
Copy link
Contributor

@josephevans josephevans commented Apr 8, 2020

Description

Add new build pipeline that will run sanity build, and if it completes successfully, run the rest of the mxnet-validation pipelines.

#17802

@mxnet-bot
Copy link

Hey @josephevans , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [clang, windows-cpu, sanity, windows-gpu, website, miscellaneous, centos-cpu, unix-cpu, unix-gpu, centos-gpu, edge]


Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

@ChaiBapchya
Copy link
Contributor

ChaiBapchya commented Apr 8, 2020

Looks like the other builds have already started running before sanity build completed @josephevans

@josephevans
Copy link
Contributor Author

@mxnet-bot run ci [unix-cpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-cpu]

@josephevans
Copy link
Contributor Author

@mxnet-bot run ci [unix-cpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-cpu]

@josephevans
Copy link
Contributor Author

Looks like the other builds have already started running before sanity build completed @josephevans

Yeah, there's a few script calls I have to approve in Jenkins for this to work. That's why it skipped the sanity check delays initially (it's fail open.) It'd be nice to use a trusted library in Jenkins to do this to (a) protect against potential security vulnerabilities and malicious abuse and (b) not have to approve these in the future. Looking into that.

@ChaiBapchya
Copy link
Contributor

Makes sense. Thanks for the clarification
+1 for using secure functions instead of insecure.
I remember facing the issue with printStackTrace requiring Admin approval for insecure script (which led to Jenkins build status not being posted on Github for brief period).

…sanity check first, then starts all other builds.
@leezu
Copy link
Contributor

leezu commented Apr 14, 2020

@mxnet-bot run ci [unix-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-gpu]

stage("full-build") {
def pipelineName = JOB_NAME.substring(0, JOB_NAME.indexOf('/'))
build job: pipelineName + "/sanity/" + BRANCH_NAME, wait: true
['centos-cpu', 'centos-gpu', 'clang', 'edge',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some way to build this list automatically?

Have you tested this setup?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has all been tested on mxnet-ci-dev.

We can get a list from Jenkins, but it would require scriptApprovals, which I'd like to avoid. I could move the build list to a variable at the top of the file so it's more clear.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which function exactly would have to be whitelisted? Maybe we can make an exception if it does not expose anything we're worried about.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would have to whitelist the Jenkins.instance static method, which would open a huge hole that anyone who submitted a PR could exploit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay agree.

max_time = 180

stage("full-build") {
def pipelineName = JOB_NAME.substring(0, JOB_NAME.indexOf('/'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure that this way is resilient? Jobs might have multiple slashes etc. This doesn't seem to be fully reliable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this will be resilient. It just extracts the parent folder of the job pipeline (ie 'mxnet-validation'.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well but this will break with nested folders (a/b/c/sanity will result in a/sanity) or if there is no top-level folder at all. @szha @sandeep-krishnamurthy can you get somebody to assist on this matter?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution @josephevans and review @marcoabreu . How can I assist here? Looks like a regular dev discussion as with any other PRs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be helpful to get somebody else involved who's familiar with Jenkins. I personally don't have enough time to do coaching

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcoabreu Thanks for the suggestions, I've update the PR to preserve the full path of the build job, so the staggered build pipelines will still work if we reorganize our Jenkins pipelines into subfolders.

…b path in case we use nested folders in the future.
Copy link
Contributor

@ChaiBapchya ChaiBapchya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
Thanks a lot for this work on staggered builds!
Lets get this merged
@sandeep-krishnamurthy @leezu plz help review/merge

@ChaiBapchya
Copy link
Contributor

@mxnet-label-bot [pr-awaiting-merge]

@marcoabreu
Copy link
Contributor

Please show that this is tested and working before merging.

@apeforest apeforest merged commit 9337137 into apache:master Apr 16, 2020
@josephevans
Copy link
Contributor Author

We will still need to make some changes to Jenkins in order for the staggered builds to start working, so there is no risk in merging now. I will send an email out to the community when I setup the new build pipeline and migrate to it.

Here is a test run we did on ci-dev. As you can see, we first run the sanity build. Once it finishes, we spawn off all the other build jobs to run in the background.

http:https://jenkins.mxnet-ci-dev.amazon-ml.com/job/mxnet-validation-joeev/job/full-build/view/change-requests/job/PR-2/16/consoleText

josephevans added a commit to josephevans/mxnet that referenced this pull request Apr 23, 2020
…ssfully before running other build pipelines. (apache#17999)

* Refactor staggered builds - create new full build pipeline that runs sanity check first, then starts all other builds.

* Move list of build jobs to top of file for clarity. Preserve whole job path in case we use nested folders in the future.

Co-authored-by: Joe Evans <[email protected]>
josephevans added a commit to josephevans/mxnet that referenced this pull request Apr 23, 2020
…ssfully before running other build pipelines. (apache#17999)

* Refactor staggered builds - create new full build pipeline that runs sanity check first, then starts all other builds.

* Move list of build jobs to top of file for clarity. Preserve whole job path in case we use nested folders in the future.

Co-authored-by: Joe Evans <[email protected]>
josephevans added a commit to josephevans/mxnet that referenced this pull request Apr 23, 2020
…ssfully before running other build pipelines. (apache#17999)

* Refactor staggered builds - create new full build pipeline that runs sanity check first, then starts all other builds.

* Move list of build jobs to top of file for clarity. Preserve whole job path in case we use nested folders in the future.

Co-authored-by: Joe Evans <[email protected]>
leezu pushed a commit that referenced this pull request Apr 23, 2020
* For mxnet-validation pipeline, require sanity build to complete successfully before running other build pipelines. (#17999)

* Refactor staggered builds - create new full build pipeline that runs sanity check first, then starts all other builds.

* Move list of build jobs to top of file for clarity. Preserve whole job path in case we use nested folders in the future.

Co-authored-by: Joe Evans <[email protected]>

* If sanity build is not found, wait until Jenkins recognizes it. (#18119)

* If sanity build is not found, wait until Jenkins recognizes it.

* Also add a timeout of 30m for sanity build to run and complete, so we don't get stuck in a loop.

Co-authored-by: Joe Evans <[email protected]>

Co-authored-by: Joe Evans <[email protected]>
leezu pushed a commit that referenced this pull request Apr 24, 2020
* For mxnet-validation pipeline, require sanity build to complete successfully before running other build pipelines. (#17999)

* Refactor staggered builds - create new full build pipeline that runs sanity check first, then starts all other builds.

* Move list of build jobs to top of file for clarity. Preserve whole job path in case we use nested folders in the future.

Co-authored-by: Joe Evans <[email protected]>

* If sanity build is not found, wait until Jenkins recognizes it. (#18119)

* If sanity build is not found, wait until Jenkins recognizes it.

* Also add a timeout of 30m for sanity build to run and complete, so we don't get stuck in a loop.

Co-authored-by: Joe Evans <[email protected]>

Co-authored-by: Joe Evans <[email protected]>
leezu pushed a commit that referenced this pull request Apr 24, 2020
* For mxnet-validation pipeline, require sanity build to complete successfully before running other build pipelines. (#17999)

* Refactor staggered builds - create new full build pipeline that runs sanity check first, then starts all other builds.

* Move list of build jobs to top of file for clarity. Preserve whole job path in case we use nested folders in the future.

Co-authored-by: Joe Evans <[email protected]>

* If sanity build is not found, wait until Jenkins recognizes it. (#18119)

* If sanity build is not found, wait until Jenkins recognizes it.

* Also add a timeout of 30m for sanity build to run and complete, so we don't get stuck in a loop.

Co-authored-by: Joe Evans <[email protected]>

Co-authored-by: Joe Evans <[email protected]>
AntiZpvoh pushed a commit to AntiZpvoh/incubator-mxnet that referenced this pull request Jul 6, 2020
…ssfully before running other build pipelines. (apache#17999)

* Refactor staggered builds - create new full build pipeline that runs sanity check first, then starts all other builds.

* Move list of build jobs to top of file for clarity. Preserve whole job path in case we use nested folders in the future.

Co-authored-by: Joe Evans <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants