Reformat of TensorRT to use subgraph API #14040

Caenorst · 2019-01-31T22:54:49Z

Description

This PR modify TensorRT usage by relying on the Subgraph API, the backend is named 'TensorRT'

Checklist

Essentials

Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Code is well-documented:
Need an update in documentation for general use cases
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

TensorRT now just rely on the Subgraph API, and a contrib function for loading weights
Add FP16 support, computation in FP16 forced by default
Graph partitioning have changed so if the same input of a subgraph is used multiple times in the subgraph then it stay as a single node in the subgraph

Comments

Now that it's using the Subgraph API, you can use TensorRT with both module API and symbolic API
The way weights are loaded in the TensorRT node is a little bit dirty, we will do a follow up PR (including one on NNVM) to natively have a node attributes own subgraph parameters including a setter on the c api

zheng-da · 2019-02-03T01:29:00Z

include/mxnet/c_api.h

- */
-MXNET_DLL int MXExecutorGetOptimizedSymbol(ExecutorHandle handle,
- SymbolHandle *out);
-


why do you need to remove an API?

I actually created this API just for the previous TensorRT implementation, I'm not sure it is used anywhere else. It could actually be used in case you are calling a subgraph backend with variable environment. Do you want to keep it ?

It affects semantic versioning if we remove it (it can break compilation on downstream projects) so if there's not a strong reason to remove it we should leave it in for that reason. Can it still preform a useful function (for example showing the graph after optimization)?

Sure, should I make it more general then ? (It was only working using TrtGraphExecutor which doesn't exist anymore)

KellenSunderland · 2019-02-03T05:37:56Z

src/operator/subgraph/tensorrt/nnvm_to_onnx.cc

+ }
+}
+
+inline std::string StringDType(int dtype) {


Is this still needed, don't see any references to it.

debug function, nice catch, will remove

KellenSunderland · 2019-02-03T05:46:34Z

src/operator/subgraph/tensorrt/onnx_to_tensorrt.cc

+ if (trt_builder->platformHasFastFp16()) {
+ trt_builder->setFp16Mode(true);
+ } else {
+ LOG(INFO) << "WARNING: TensorRT can't use fp16 on this plateform";


plateform -> platform

Also we're logging INFO level logs but have WARNING in the message. I'd remove the warning from the message and set log level to warning. (this is a common issue in our codebase).

KellenSunderland · 2019-02-03T05:50:42Z

src/operator/subgraph/tensorrt/tensorrt-inl.h

+ return false;
+ }
+ }
+ if (isTRTCompatible(new_node))


Can be simplified to

return isTRTCompatible(new_node);

KellenSunderland · 2019-02-03T05:51:32Z

src/operator/subgraph/tensorrt/tensorrt-inl.h

+ if (o.index == e->index && o.node.get() == e->node.get()) {
+ e->index = i;
+ e->node = subgraph_node;
+ // TODO(cfujitsang): For futur support this would fail


futur -> future.

KellenSunderland · 2019-02-03T06:06:25Z

src/operator/subgraph/tensorrt/tensorrt-inl.h

+ }
+
+ nnvm::NodePtr CreateSubgraphNode(const nnvm::Symbol &sym,
+ const int subgraph_id = 0) const override {


If possible can we avoid default values in overriding functions.

KellenSunderland · 2019-02-03T06:35:53Z

ONNX-Tensorrt might fail to build as it adds a new default target that requires a header not in a standard search path. There's a few ways to work around the problem, we could just build the library and not that tool in CI, or we could include the header folder location in the search path like so: https://github.com/apache/incubator-mxnet/pull/13906/files#diff-56133c25b5a238b76f54c0928f05a8e6

It should allow the TensorRT build to pass CI if we make that change in this PR.

Caenorst · 2019-02-04T21:41:07Z

Seems that my modification in CutGraphInputs is breaking some of the default attributes inference functions, I think my modification make sense (duplicating inputs in the subgraph appear to me like a bug). I can modify src/operator/subgraph/common.h accordingly if you agree with my modification on CutGraphInputs.

KellenSunderland · 2019-02-04T21:41:18Z

CI should be in reasonably good shape now. Looks like there's some linting issues in a few headers:

=====269/271 cpp-header files passed check=====
src/operator/subgraph/tensorrt/nnvm_to_onnx-inl.h: 1 Errors of 1 Categories map={'build': 1}
src/operator/subgraph/tensorrt/tensorrt-inl.h: 4 Errors of 2 Categories map={'whitespace': 1, 'build': 3}

reminisce · 2019-02-05T20:31:36Z

Great to see this is happening. I have two high-level comments:

If you use the subgraph API, there should be no needs to add specific Python functions (e.g. init_tensorrt_params) to use TensorRT as a backend inference engine. I think everything can be handled in backend to support Module and simple_bind APIs in the frontend.
Have you considered adopting TensorRT C++ APIs to convert MXNet subgraph IR to TensorRT IR? This would allow you to get rid of dependency of protobuf, onnx, and onnx-trt. I used to find building MXNet with those third-party libs is a painful process, especially on edge devices. In addition, simply relying on the TensorRT C++ APIs allows we to extend operator coverage promptly once new release of TensorRT is out. I did this while integrating TensorRT with TVM, which shares the same graph IR as in MXNet.
https://github.com/reminisce/tvm/blob/subgraph_integration/src/contrib/subgraph/tensorrt_executor.cc

Caenorst · 2019-02-05T23:22:59Z

@reminisce

If you know how to do it I'd like your help, I asked the question on the dev list and have discussed with Kellen Sunderland and Zheng Da, and we haven't find an easy solution. The final objective later would be to modify function that do parameters initialization. Currently using init_tensorrt_params doesn't prevent to use Module or Symbolic API
so far on our side, there is no plan to integrate TensorRT in MxNet without ONNX. It's easier for us to rely on onnx-tensorrt for reusability

reminisce · 2019-02-07T04:00:08Z

@Caenorst
Just want to clarify, I'm not blocking this PR. We can think through about these comments and make incremental changes afterwards.

The point is adding zero overhead in terms of user experience of using TensorRT in MXNet. I don't think you need to pass the param data explicitly beforehand. They can all be collected at the first forward call for each subgraph. You can take a look how this is done in TVM. On the high level, in the forward function of a tensorrt subgraph op, you have a list of inputs which contains both input data and params. You just need to differentiate param names from input data names. This can be achieved by analyzing whether a data node's NodeEntry is an operator's input data or param.

Regarding the dependence on protobuf, onnx, and onnx-tensorrt, here is my two cents, I personally don't think introducing so many third-party dependencies is a good idea, because it results in an extra layer of complexity in terms of implementation, maintenance, build, and deployment, and the conversion process is not manageable by this community. If there is a direct solution of creating a bridge from nnvm to tensorrt, why not use it?

Caenorst · 2019-02-07T04:52:24Z

@reminisce
I completely get your points, we thought about loading the weights from inputs, the problem is that you can't mix context in a single graph and as far as I know you can't delete the weights from the graph, so it will force to have the twice the weights in GPU memory, which may become a problem at some point. Ideally if we can somehow at least deallocate the memory of the weights once copied inside the TensorRT engine that would be nice, I haven't tried to see if it is doable tho

reminisce · 2019-02-07T05:20:47Z

you can't mix context in a single graph

What do you mean "mix context"? We only have one context which gpu in this case.

Regarding releasing the weight params after they are copied to tensorrt engines, I think it's doable by removing the corresponding NDArray from data_entry_ in GraphExecutor.

Caenorst · 2019-02-07T18:10:01Z

@reminisce
Yes, what I mean is that if weights for TensorRT node are on CPU while the rest of the graph is on GPU, it would be less of a problem. Also an argument for onnx-tensorrt is that there is more Ops supported with plugins implemented (slice, some activation, resize with nearest interpolation...)

reminisce · 2019-02-07T18:19:12Z

weights for TensorRT node are on CPU while the rest of the graph is on GPU.

This is not true. When binding completes, you have all the weights on GPU.

reminisce · 2019-02-07T18:27:37Z

Also an argument for onnx-tensorrt is that there is more Ops supported with plugins implemented (slice, some activation, resize with nearest interpolation...)

Are you saying that there are operators supported by TensorRT but creating them is not available through the C++ APIs? Have those operators been added to the unconditionalTRTop? The TRT C++ API also supports plugin interface.

Caenorst

Are you saying that there are operators supported by TensorRT but creating them is not available through the C++ APIs? Have those operators been added to the unconditionalTRTop? The TRT C++ API also supports plugin interface.

They are added as IPlugin (so not natively supported by TensorRT), but of course, you could do try to do your own Plugins. For the moment I don't think any of those ops are supported by my implementation, but I don't think it will be hard to do, it's just a long term argument

Caenorst · 2019-02-03T06:10:51Z

src/operator/subgraph/tensorrt/nnvm_to_onnx.cc

+ }
+}
+
+inline std::string StringDType(int dtype) {


debug function, nice catch, will remove

Caenorst · 2019-02-03T06:22:11Z

include/mxnet/c_api.h

- */
-MXNET_DLL int MXExecutorGetOptimizedSymbol(ExecutorHandle handle,
- SymbolHandle *out);
-


Sure, should I make it more general then ? (It was only working using TrtGraphExecutor which doesn't exist anymore)

ankkhedia · 2019-02-14T22:08:38Z

@Caenorst Could you look fix merge conflicts and retrigger CI?
@reminisce Could you please help @Caenorst with his questions?

Caenorst · 2019-02-18T15:33:17Z

weights for TensorRT node are on CPU while the rest of the graph is on GPU.

This is not true. When binding completes, you have all the weights on GPU.

Yes, that's what I meant, weights have to be copied from CPU to GPU, so they have to be originally on CPU. Actually they have to be copied first in the ONNX model

karan6181 · 2019-03-19T00:13:42Z

@KellenSunderland and @anirudh2290 could you please review this PR again.

Thanks !

KellenSunderland · 2019-03-22T21:48:11Z

@karan6181 Taking a look this week.

KellenSunderland · 2019-03-28T17:33:30Z

Hey @Caenorst Sorry can you rebase this once more? There's no reported conflicts from Github but for some reason when running in CI it's not picking up some changes which means the R tests will fail (they're trying to use an old domain).

PR looks good to me though, will merge as soon as it's rebased.

abhinavs95 · 2019-03-28T21:16:07Z

@Caenorst Thank you for making the changes. @KellenSunderland is this good to go?

abhinavs95 · 2019-03-28T21:17:06Z

@mxnet-label-bot add [pr-awaiting-testing]

KellenSunderland · 2019-03-28T21:54:44Z

LGTM, did some testing on a home machine.

KellenSunderland · 2019-04-01T16:24:02Z

Small update: still debugging a crash showing up during teardown. Seems like a problem relating to the order of static scoped objects being deconstructed.

KellenSunderland · 2019-04-02T03:45:51Z

Are you able to reproduce this locally @Caenorst? I've run through quite a few times now and haven't hit the issue.

KellenSunderland · 2019-04-02T03:49:42Z

Scratch that, able to reproduce now. Digging a bit further in ...

KellenSunderland · 2019-04-02T04:45:22Z

Ok I see what's happening. Looks like we've done some refactoring to help prevent shutdown engine crashes but still have some fixed to apply before it's working 100%. I believe this PR should solve the issue we're seeing in the test: #14591 Would you be able to test with that commit applied?

KellenSunderland · 2019-04-17T03:50:57Z

Ok issue that was tripping up our tests has been fixed. Would you be able to do another rebase (sorry) and then we should be able to merge.

roywei · 2019-04-29T16:23:43Z

@Caenorst thanks for the contribution, gentle ping to rebase so we can merge this PR.

Caenorst · 2019-05-01T10:58:10Z

Rebase done. Sorry for the delay.

KellenSunderland · 2019-05-01T16:36:32Z

No worries about the delay. My team's been testing the TRT integration out and are seeing better than expected speedups so far. Let's merge this change. I'm adding a todo for myself to update all documentation relating to the API change in our TRT tutorials. We've also found a few minor bugs we'll hope to PR soon.

ThomasDelteil · 2019-06-07T17:30:48Z

@KellenSunderland @Caenorst , the tutorial for tensorrt is out of the date and I have had a few user asking us why it's not working anymore https://mxnet.incubator.apache.org/versions/master/tutorials/tensorrt/inference_with_trt.html
What is the current recommended way to use tensorRT ?
Thanks!

Caenorst · 2019-06-07T17:43:11Z

@ThomasDelteil the PR for updating it is on its way: https://mxnet.incubator.apache.org/versions/master/tutorials/tensorrt/inference_with_trt.html

Meanwhile here is the new way to use TensorRT:

sym, arg_params, aux_params = mx.model.load_checkpoint('my_net', epochs)
trt_sym = sym.get_backend_symbol('TensorRT')
remaining_arg_params, remaining_aux_params = mx.contrib.tensorrt.init_tensorrt_params(trt_sym, arg_params, aux_params)

And then you can use trt_sym as your usual symbol, no more tensorrt_bind or MXNET_USE_TENSORRT environment variable. Also now FP16 computation is by default if you want to disable it use the function mx.contrib.tensorrt.set_use_fp16(False)

ThomasDelteil · 2019-06-07T17:44:40Z

Thanks @Caenorst for the quick update!

KellenSunderland · 2019-06-07T19:01:56Z

Yes, sorry let me update that PR to target the 1.5 branch.

…

On Fri, Jun 7, 2019 at 10:45 AM Thomas Delteil ***@***.***> wrote: Thanks @Caenorst <https://github.com/Caenorst> for the quick update! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#14040>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABYZGE4XNDH2C4DFF2WSHDDPZKNCVANCNFSM4GTVIXMA> .

Caenorst requested review from anirudh2290 and szha as code owners January 31, 2019 22:54

szha added API change Breaking labels Feb 1, 2019

zheng-da reviewed Feb 3, 2019

View reviewed changes

KellenSunderland reviewed Feb 3, 2019

View reviewed changes

Caenorst commented Feb 7, 2019

View reviewed changes

KellenSunderland removed API change Breaking labels Mar 28, 2019

Caenorst force-pushed the trt_reformat branch from 0920ae9 to 144e6ef Compare March 28, 2019 19:33

Caenorst force-pushed the trt_reformat branch 2 times, most recently from ccae63f to 5418349 Compare March 28, 2019 20:59

marcoabreu added the pr-awaiting-testing PR is reviewed and waiting CI build and test label Mar 28, 2019

Caenorst force-pushed the trt_reformat branch from 5418349 to 0355184 Compare March 28, 2019 23:14

KellenSunderland mentioned this pull request Apr 21, 2019

[Discussion] 1.5.0 Roadmap #14619

Closed

Caenorst force-pushed the trt_reformat branch from 0355184 to 6720c5c Compare May 1, 2019 10:54

Caenorst force-pushed the trt_reformat branch from 6720c5c to 647d097 Compare May 1, 2019 11:33

reformat trt to use subgraph API, add fp16 support

66d0cdb

Caenorst force-pushed the trt_reformat branch from 647d097 to 66d0cdb Compare May 1, 2019 13:59

KellenSunderland merged commit 1c874cf into apache:master May 1, 2019

access2rohit pushed a commit to access2rohit/incubator-mxnet that referenced this pull request May 14, 2019

reformat trt to use subgraph API, add fp16 support (apache#14040)

9757a59

haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019

reformat trt to use subgraph API, add fp16 support (apache#14040)

34b4e1b

Reformat of TensorRT to use subgraph API #14040

Reformat of TensorRT to use subgraph API #14040

Conversation

Caenorst commented Jan 31, 2019 • edited Loading

Description

Checklist

Essentials

Changes

Comments

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KellenSunderland Feb 3, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KellenSunderland commented Feb 3, 2019

Caenorst commented Feb 4, 2019

KellenSunderland commented Feb 4, 2019

reminisce commented Feb 5, 2019

Caenorst commented Feb 5, 2019 • edited Loading

reminisce commented Feb 7, 2019

Caenorst commented Feb 7, 2019

reminisce commented Feb 7, 2019 • edited Loading

Caenorst commented Feb 7, 2019

reminisce commented Feb 7, 2019

reminisce commented Feb 7, 2019

Caenorst left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ankkhedia commented Feb 14, 2019

Caenorst commented Feb 18, 2019

karan6181 commented Mar 19, 2019

KellenSunderland commented Mar 22, 2019

KellenSunderland commented Mar 28, 2019 • edited Loading

abhinavs95 commented Mar 28, 2019

abhinavs95 commented Mar 28, 2019

KellenSunderland commented Mar 28, 2019

KellenSunderland commented Apr 1, 2019

KellenSunderland commented Apr 2, 2019

KellenSunderland commented Apr 2, 2019

KellenSunderland commented Apr 2, 2019

KellenSunderland commented Apr 17, 2019

roywei commented Apr 29, 2019

Caenorst commented May 1, 2019

KellenSunderland commented May 1, 2019

ThomasDelteil commented Jun 7, 2019

Caenorst commented Jun 7, 2019

ThomasDelteil commented Jun 7, 2019

KellenSunderland commented Jun 7, 2019 via email

Caenorst commented Jan 31, 2019 •

edited

Loading

KellenSunderland Feb 3, 2019 •

edited

Loading

Caenorst commented Feb 5, 2019 •

edited

Loading

reminisce commented Feb 7, 2019 •

edited

Loading

KellenSunderland commented Mar 28, 2019 •

edited

Loading