Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Fixed and re-enables TensorRT steps #14960

Merged
merged 3 commits into from
May 15, 2019

Conversation

perdasilva
Copy link
Contributor

@perdasilva perdasilva commented May 15, 2019

Description

TensorRT steps were blocking CI (see #14961). It seems there's been a change in the nvidia apt repository. Then libnvinfer-dev seems to have been updated and no longer was getting installed properly. This PR updates the TensorRT environment pinning these dependencies.

After the initial commit, the compilation stage was passing, but the test stage failed. It complained that TensorRT had been compiled with cuDNN 7.5.0 but was linking against 7.5.1 (which comes stock in the nvidia/cuda:10.0-cudnn7-devel image used as the base image). So, in a subsequent commit, I've had to pin the cuDNN version to 7.5.0.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http:https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Pins TensorRT dependency versions and cudnn version

@@ -18,7 +18,7 @@
#
# Dockerfile to run MXNet on Ubuntu 16.04 for CPU

FROM nvidia/cuda:10.0-cudnn7-devel
FROM nvidia/cuda:10.0-cudnn7-devel-ubuntu16.04
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ubuntu 18 was intentional, I think
@KellenSunderland

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove that - and we can move it to a different PR if necessary

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a massive deal either way. I think both are fairly well supported.

@perdasilva perdasilva force-pushed the tensorrt_fix branch 2 times, most recently from 098c744 to a9549b5 Compare May 15, 2019 08:43
@perdasilva perdasilva changed the title [WIP] Updates base image for tensorrt gpu [WIP] Pins libnvinfer versions for TensorRT image May 15, 2019
@perdasilva perdasilva force-pushed the tensorrt_fix branch 2 times, most recently from b079042 to 165b798 Compare May 15, 2019 12:02
@perdasilva perdasilva changed the title [WIP] Pins libnvinfer versions for TensorRT image Pins TensorRT dependencies May 15, 2019
@perdasilva perdasilva changed the title Pins TensorRT dependencies Fixed and re-enables TensorRT steps May 15, 2019
@perdasilva
Copy link
Contributor Author

@KellenSunderland Could you please review/merge this if it looks ok for you?

Copy link
Member

@roywei roywei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for the fix, let's merge it so contributors only need to rebase once.

@perdasilva
Copy link
Contributor Author

@roywei thank you ^^ I'm not a comitter, so I can't T_T

@eric-haibin-lin eric-haibin-lin merged commit f7b7163 into apache:master May 15, 2019
@perdasilva perdasilva deleted the tensorrt_fix branch May 16, 2019 07:06
@perdasilva perdasilva restored the tensorrt_fix branch May 20, 2019 09:02
haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019
* Pins libnvinfer versions

* Sets cudnn to version 7.5.0 in tensorrt environment

* Re-enables TensorRT stages
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants