Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[MXNET-862] Basic maven jenkins pipeline #13450

Merged
merged 2 commits into from
Jan 8, 2019

Conversation

zachgk
Copy link
Contributor

@zachgk zachgk commented Nov 29, 2018

Description

This is a WIP PR to share progress on the maven nightly jenkins pipeline. The goal is to create a Jenkins for continuous deployment that will build scala jars, test them on multiple different environments, and then deploy them to the apache snapshot repository. It can also provide some setup for building out additional continuous deployment efforts for other platforms such as pip.

Note that this PR depends on #13046 and is rebased on top of it. Those changes are displayed as part of the first commit in the PR.

@nswamy @lanking520 @andrewfayres @piyushghai @marcoabreu @yzhliu

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Build pipeline for build, test, deploy

@zachgk
Copy link
Contributor Author

zachgk commented Nov 29, 2018

@mxnet-label-bot add [Java, Scala, CI, Maven]

@marcoabreu marcoabreu added CI Java Label to identify Java API component Maven Scala labels Nov 29, 2018
@marcoabreu
Copy link
Contributor

Lets wait until after the release before we move forward with this PR as discussed on dev@ :)

if(clean == 1)
clean = 'git clean -xdf'
else if(clean == 2)
clean = 'git clean -xdff'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a difference between the clean command in clean == 1 and clean == 2 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The clean -ff will remove code inside the .git directory which messes with the deploy process. See https://git-scm.com/docs/git-clean

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please leave this as it is.

The workspace always has to be considered empty and reproducible. If you need to persist something in between stages, use the stashing mechanism. With this change you are circumventing a problem - that you are not persisting state properly - and it will blow up as soon as we have more than one restricted slave and job one gets scheduled onto a different slave than job two

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is not to persist something in between stages. During maven deploy, it attempts to move all code into a temp directory, rebuild, and deploy from that temp directory. The rebuilding calls "git submodule update" which breaks if the .git directory has already been cleaned with "-ff" since the submodule configuration is deleted. I keep the default as "clean -xdff" so that keeping the .git directory must be explicitly chosen and does not affect any other code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd like to have a deeper look into that, so far I'm not following

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The maven release plugin messes with the project SCM (git). It first tries to create a new tag for the release and a new commit updating the version number located in the POM file. Then, when performing the release, it will checkout the code to a temporary directory and perform the release (building, testing, and deploying) from that new clean directory. This will break if the .git directory has been pruned. You can see docs for the maven release plugin at http:https://maven.apache.org/maven-release/maven-release-plugin/examples/prepare-release.html

@@ -119,6 +119,14 @@
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
<executions>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the child poms not inherit this execution stage from the parent pom file ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the maven-jar-plugin is in the parent pom file because we don't want all of the modules to inherit it

@@ -0,0 +1,50 @@
# MXNet Scala Package Test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This README is super helpful!
Thanks @zachgk 💯


SCALA_VERSION_PROFILE := 2.11
SCALA_VERSION := 2.11.8
MXNET_VERSION := 1.3.1-SNAPSHOT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this version remain hardcoded here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will investigate using [1.3.1-SNAPSHOT,) to see if it works. Otherwise, I don't know if there are other options.

Makefile Outdated
-Dbuild.platform="$(SCALA_PKG_PROFILE)" \
-Dcflags="$(CFLAGS)" -Dldflags="$(LDFLAGS)" \
-Dlddeps="$(LIB_DEP) $(ROOTDIR)/lib/libmxnet.a")

scaladeploylocal:
(cd $(ROOTDIR)/scala-package; \
mvn deploy $(MAVEN_ARGS) -Papache-release,$(SCALA_PKG_PROFILE),$(SCALA_VERSION_PROFILE) \-DskipTests=true -Dcxx="$(CXX)" \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why you are skipping the tests here on the deploy stage ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are skipping those tests because we run the tests separately with scalaunittest/scalaintegrationtest before calling deploy and there is no reason to repeat tests

@@ -454,6 +454,10 @@ else
CFLAGS += -DMXNET_USE_LIBJPEG_TURBO=0
endif

ifeq ($(CI), 1)
MAVEN_ARGS := -B
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious to know what this -B flag does :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The -B is batch mode. The main change is that it removes the downloading progress messages that are written to the terminal/logs. They take up a lot of space and make the logs hard to read.

@zachgk zachgk force-pushed the mavenPublishing branch 2 times, most recently from 4a1a078 to 88cd89e Compare December 8, 2018 00:28
@Roshrini
Copy link
Member

@zachgk Can you rebase this PR?
@mxnet-label-bot Update [pr-work-in-progress]

@marcoabreu marcoabreu added pr-work-in-progress PR is still work in progress and removed CI Java Label to identify Java API component Maven Scala labels Dec 19, 2018
@zachgk zachgk changed the title [WIP][MXNET-862] Basic maven jenkins pipeline [MXNET-862] Basic maven jenkins pipeline Dec 20, 2018
@zachgk
Copy link
Contributor Author

zachgk commented Dec 20, 2018

I am currently thinking that since these are relatively large commits, I would split them into three different PRs that will create respectively a basic pipeline, a pipeline with multi-environment testing, and the final pipeline integrating the static linking code. After rebasing, I reduced the amount of code as part of this PR (it has some of commit 2), so it should be roughly equal to the code from 20 days ago. This also no longer has code from the packageTest utility because that was merged into master.

@mxnet-label-bot update [CI, Java, Maven, Scala, pr-awaiting-review]

@marcoabreu marcoabreu added CI and removed pr-work-in-progress PR is still work in progress labels Dec 20, 2018
@marcoabreu marcoabreu added Java Label to identify Java API component Maven pr-awaiting-review PR is waiting for code review Scala labels Dec 20, 2018
@zachgk zachgk force-pushed the mavenPublishing branch 2 times, most recently from c03e2e2 to f317589 Compare December 20, 2018 19:30
Copy link
Member

@lanking520 lanking520 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution, please indicate the change you would like to make for your second PR

@zachgk
Copy link
Contributor Author

zachgk commented Dec 20, 2018

As part of the later PRs, the changes to the current files will include:

  • Add new Docker containers specific to publishing and a clean dependency-less environment for testing so the updates to ubuntu_scala will be moved
  • The scala deployment will be moved from scala-package/dev to ci/publish/scala
  • It will use the static building script instead of the scala-package/dev/compile-mxnet-backend.sh script for building the backend

Note that this PR forms a working pipeline that will build, test, and then deploy the package for ubuntu 16.04

Please review @nswamy @andrewfayres @piyushghai @marcoabreu

utils.assign_node_labels(utility: 'restricted-utility', linux_cpu: 'restricted-mxnetlinux-cpu', linux_gpu: 'restricted-mxnetlinux-gpu', linux_gpu_p3: 'restricted-mxnetlinux-gpu-p3', windows_cpu: 'restricted-mxnetwindows-cpu', windows_gpu: 'restricted-mxnetwindows-gpu')

def nodeMap = ['cpu': NODE_LINUX_CPU, 'gpu': NODE_LINUX_GPU]
def scalaOSMap = ['cpu': 'linux-x86_64-cpu', 'gpu': 'linux-x86_64-gpu']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment here as to why we are not doing MacOS through Jenkins ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

cleanCmd = 'git clean -xdf'
else if(clean == 2)
cleanCmd = 'git clean -xdff'
sh cleanCmd
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate why this is needed? This "super brutal clean" as I like to call it, was requested by @lebeg and he said it's important. Maybe you two could have a discussion there :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I can remove this because we are no longer going to use the maven-release-plugin following updates as part of #13626

def nodeMap = ['cpu': NODE_LINUX_CPU, 'gpu': NODE_LINUX_GPU]
def scalaOSMap = ['cpu': 'linux-x86_64-cpu', 'gpu': 'linux-x86_64-gpu']

def wrapStep(nodeToRun, workspaceName, step) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love this generator model. Great job!

return {
node(nodeToRun) {
ws("workspace/${workspaceName}") {
env.MAVEN_OPTS = "-Dorg.slf4j.simpleLogger.log.org.apache.maven.cli.transfer.Slf4jMavenTransferListener=warn" // Clean up maven logs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to move this logic into the runtime functions? We generally try to keep Jenkinsfile limited to process flow design while logic is in the runtime functions. Otherwise, you can't reproduce the individual steps using the docker wrapper because certain variables (like this env var) are set outside of the docker process.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I'll move it

logging.exception("The request was invalid due to:")
elif client_error.response['Error']['Code'] == 'InvalidParameterException':
logging.exception("The request had invalid params:")
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is the default template of AWS SecretsManager, but could we raise always (instead of the else case) to halt the execution? I have made the experience that letting the process continue to then fail later obfuscated the error message and confuses some people.

In this particular case, the function would return None while the caller expects a tuple. This then throws a confusion message which makes it harder to debug. wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this sounds like a good improvement. I'll apply it in docker_cache as well for the sake of consistency

def importASC(key, gpgPassphrase):
filename = os.path.join(KEY_PATH, "key.asc")
with open(filename, 'w') as f:
f.write(key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we making sure that this file is deleted when the process finishes? Or is it local to the container?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, same applies to the hidden settings.xml in .m2 folder

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is put in the ~ directory, which I think that makes it local to the container. Either way, I will add some rm statements for extra security.

f.write(key)
subprocess.run(['gpg2', '--batch', '--yes',
'--passphrase=\"{}\"'.format(gpgPassphrase),
"--import", "{}".format(filename)])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha! Pro tip - I currently have to write a COE because I made the same mistake - don't do that or you will leak credentials :P

Subprocess.run will print the command (in this case including the password) if the execution fails. This then writes the password to the log and you will leak credentials. Is it possible to pass in the password using STDIN or some other redirection method?

I can link you the ticket privately if you're interested in further details.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcoabreu Currently, we use fake gpg key in here as it is not necessary to have the real one in the deploy stage. Problems will appear in the release stage as I would not do it with CI. But still it is very useful information, please keep us updated if you find a better solution to this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lanking520 We still use this to encrypt the passwords through maven, so this is certainly a problem we need to address.

I worked around this using a combination of passing data through stdin and passing data using the linux expect program (because maven doesn't work with stdin)

.format(password))


def severPSW(username, password, gpgPassphrase):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does "sever" mean? 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Apache server

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lanking520 "server" was misspelled

@lanking520
Copy link
Member

@marcoabreu Could you please review again?

Copy link
Contributor

@piyushghai piyushghai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for these changes. Looks good to me :)

Copy link
Member

@lanking520 lanking520 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still looks good. Tested locally with the build steps and all working!

Copy link
Contributor

@marcoabreu marcoabreu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nits, but overall looks great :)


# Scala steps to deploy
make scalapkg CI=1
make scalaunittest CI=1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The filename lets me assume the task of this script is to build, but it seems like a lot of testing is happening as well.

Generally, we try to split compilation and testing first to make it cleaner, and second to use the most appropriate instance type. Since you also made a test script, could we move these tests over there to keep the compilation jobs as lightweight as possible?


if [[ $MAVEN_PUBLISH_OS_TYPE == "linux-x86_64-cpu" ]];
then
MAKE_FLAGS="USE_BLAS=openblas USE_DIST_KVSTORE=1 ENABLE_TESTCOVERAGE=1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to compile with test coverage annotations for production builds?

MAKE_FLAGS="USE_BLAS=openblas USE_DIST_KVSTORE=1 ENABLE_TESTCOVERAGE=1"
elif [[ $MAVEN_PUBLISH_OS_TYPE == "linux-x86_64-gpu" ]]
then
MAKE_FLAGS="USE_OPENCV=1 USE_BLAS=openblas USE_CUDA=1 USE_CUDA_PATH=/usr/local/cuda USE_CUDNN=1 SCALA_ON_GPU=1 SCALA_TEST_ON_GPU=1 USE_DIST_KVSTORE=1 ENABLE_TESTCOVERAGE=1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

@lanking520 lanking520 merged commit 96439e6 into apache:master Jan 8, 2019
piyushghai pushed a commit to piyushghai/incubator-mxnet that referenced this pull request Feb 27, 2019
* Jenkins Publish Nightly Maven

Progress

* Seperate Build, Test, and Deploy Stages with parallel
@zachgk zachgk deleted the mavenPublishing branch April 12, 2019 22:26
lanking520 pushed a commit to lanking520/incubator-mxnet that referenced this pull request Apr 26, 2019
* Jenkins Publish Nightly Maven

Progress

* Seperate Build, Test, and Deploy Stages with parallel
lanking520 pushed a commit to lanking520/incubator-mxnet that referenced this pull request Apr 26, 2019
* Jenkins Publish Nightly Maven

Progress

* Seperate Build, Test, and Deploy Stages with parallel
lanking520 pushed a commit to lanking520/incubator-mxnet that referenced this pull request Apr 30, 2019
* Jenkins Publish Nightly Maven

Progress

* Seperate Build, Test, and Deploy Stages with parallel
zachgk added a commit to zachgk/incubator-mxnet that referenced this pull request May 16, 2019
* Jenkins Publish Nightly Maven

Progress

* Seperate Build, Test, and Deploy Stages with parallel
haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019
* Jenkins Publish Nightly Maven

Progress

* Seperate Build, Test, and Deploy Stages with parallel
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
CI Java Label to identify Java API component Maven pr-awaiting-review PR is waiting for code review Scala
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants