Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

A solution to prevent zombie containers locally and in CI #12381

Merged
merged 1 commit into from
Aug 28, 2018

Conversation

larroy
Copy link
Contributor

@larroy larroy commented Aug 28, 2018

Description

This PR adds mechanisms on the build scripts to cleanup docker containers when there is a cancellation, either from the user by sending SIGINT / SIGTERM or when the process inside the container ends with an error.

It also propagates the environment variables from Jenkins which are used by the process tree killer to identify runaway processes so processes inside the container are killed when the Jenkins job is stopped.

With this patch we catch SIGINT and SIGTERM and install a handler on atexit which cleans (stops) docker containers which were created during execution of the build.

We also switch to python docker API for better management of running containers.

@aaronmarkham @marcoabreu @lebeg @KellenSunderland

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Changes the build.py script to track running containers, propagates env to Jenkins for process tree killer.

@larroy larroy changed the title Zombies singled A solution to prevent zombie containers locally and in CI Aug 28, 2018
Fix pylint, mypy, and pycharm code inspection warnings
@marcoabreu marcoabreu merged commit e2a3eef into apache:master Aug 28, 2018
anirudh2290 pushed a commit to anirudh2290/mxnet that referenced this pull request Sep 19, 2018
Fix pylint, mypy, and pycharm code inspection warnings
@larroy larroy deleted the zombies_singled branch November 15, 2018 18:44
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants