[MXNET-857] Add initial NVTX profiler implementation #12328

KellenSunderland · 2018-08-24T09:45:05Z

Description

This PR builds on-top of current profiler support to allow profiling via NVIDIA's NVTX APIs. These extensions mark readable ranges in the NVIDIA Visual Profiler which helps show correlations between kernel launches and graph node executions.

Example shown here: https://user-images.githubusercontent.com/7443219/33946110-34296d18-e021-11e7-8d18-6d40b797405c.png (The additional information enabled is in the 'Markers and Ranges' row.)

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at http:https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this changs

Notes:

TODO: Integration test

KellenSunderland · 2018-08-24T16:59:23Z

tests/python/profiling/test_nvtx.py

+ print(sys.executable)
+
+ print(os.path.realpath(__file__))
+


TODO: launch subprocess that profiles the simple forward pass of network py. Verify that ranges are collected properly.

Todo done. Testing in CI to make sure it correctly detects ranges.

ptrendx · 2018-08-24T18:45:53Z

One thing that would make the profile more readable is different colors for different ranges. Here is an example how to do this: https://github.com/NVIDIA/DALI/blob/master/dali/common.h#L149

You could for example use the first letter of the range as a very simple hash to the list of predetermined colors (don't do anything more fancy like actual hashing of the name since that would only introduce overhead which you don't want during profiling).

KellenSunderland · 2018-08-24T19:54:18Z

Great idea. Totally agree.

…

On Fri, Aug 24, 2018, 8:46 PM Przemyslaw Tredak ***@***.***> wrote: One thing that would make the profile more readable is different colors for different ranges. Here is an example how to do this: https://github.com/NVIDIA/DALI/blob/master/dali/common.h#L149 You could for example use the first letter of the range as a very simple hash to the list of predetermined colors (don't do anything more fancy like actual hashing of the name since that would only introduce overhead which you don't want during profiling). — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#12328 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AHGTE8BdwlbGFLcacaCC2pyFUX6eu1FOks5uUEobgaJpZM4WLDxa> .

stu1130 · 2018-09-18T01:06:14Z

@mxnet-label-bot [pr-work-in-progress]

vandanavk · 2018-09-27T21:56:05Z

@KellenSunderland Is the PR still WIP?

KellenSunderland · 2018-09-28T05:17:58Z

@vandanavk Yes, I'll loop around to this once I finish some other tasks. Does it cause problems for your team to leave it in a WIP state? For me it's quite handy to be able to have WIP PRs that I can put aside and resume later.

Roshrini · 2018-10-18T00:41:04Z

@KellenSunderland It doesn't cause problems for our team. But we do try to go through all the stale PRs in the repo to make sure someone is working on it and PRs are getting timely reviewed.

KellenSunderland · 2018-10-18T17:21:54Z

Ok still planning to loop around on this after checking out some code from another implementation to see if there's any lessons to learn there.

lupesko · 2018-10-29T23:44:31Z

@KellenSunderland thanks for the contribution! Any updates on progress/ETA?

KellenSunderland · 2018-10-29T23:57:19Z

Hey @lupesko. Sorry still in my backlog. I've reviewed a few suggestions so no longer blocked on anyone. I just need to find the time to finish this off.

anirudhacharya · 2018-11-13T00:34:30Z

@KellenSunderland pinging for update.

stu1130 · 2018-11-21T18:58:57Z

@KellenSunderland ping
feel free to close the PR and reopen it if you have time :)

vandanavk · 2018-11-28T21:34:58Z

@mxnet-label-bot add [pr-awaiting-testing]

roywei · 2018-12-11T00:53:56Z

@KellenSunderland ping for update and trigger ci. thanks!

sandeep-krishnamurthy · 2018-12-26T21:19:54Z

This will be very helpful addition to profiler utility. @KellenSunderland - looking forward for the update.

stu1130 · 2019-01-15T22:30:00Z

@KellenSunderland ping for the update. Thanks

KellenSunderland · 2019-01-15T23:29:42Z

@stu1130 @sandeep-krishnamurthy @roywei @anirudh2290 @lupesko
I'm about to resume work to finish this PR off but I'm focusing on merging these two PRs first #13310 and #13311. Would really appreciate it if I could get some help with a review by tomorrow so I can get these into 1.4.x in time for a code freeze.

pinaraws · 2019-03-19T23:11:27Z

@mxnet-label-bot add[pr-awaiting-review]

pinaraws · 2019-03-19T23:11:43Z

@mxnet-label-bot remove[pr-work-in-progress]

KellenSunderland · 2019-04-18T21:02:00Z

Still making progress on this. I believe the last TODO is either use static linking or to make sure the nvExt library is copied to the lib path for testing on windows. It's probably best for the customer to use static linking, so I'll try to implement that.

Edit: I think windows support is going to be problematic. If we reference NVTX in Windows by default NVIDIA recommends shipping a the dll with the application:

5.4	Deploying NVTX
The NVTX .dll is not installed into c:\Windows\System32 or another global location. Instead, make sure to deploy the .dll with your application. 

Ref: https://docs.nvidia.com/gameworks/content/developertools/desktop/nsight/nvtx_library.htm

I think this would be more effort than it's worth for 99% of our Windows users. On Linux the library is installed and on the path for everyone with Cuda installed, so I don't have concerns turning NVTX on by default for profiling on that platform.

KellenSunderland · 2019-04-19T15:52:53Z

@sandeep-krishnamurthy ready for review if you've got a second.

tests/python/profiling/simple_forward.py

eric-haibin-lin · 2019-05-01T01:18:49Z

LGTM. Will this be enabled by default (e.g. via pip)?

KellenSunderland · 2019-05-01T01:55:15Z

Yes, it'll be on by default for CUDA builds.

KellenSunderland · 2019-05-01T16:38:47Z

@eric-haibin-lin Any other questions about this PR? I've done a bunch of perf tests offline and don't see any regressions, but don't mind doing some additional investigations if you'd like. If you're ok with the change would you be able to set this PR to approved, I'm hesitant to merge without green checkboxes :-).

szha · 2019-05-01T18:41:22Z

CMakeLists.txt

+ if(NVTX_FOUND)
+ include_directories(${NVTX_INCLUDE_DIRS})
+ list(APPEND mxnet_LINKER_LIBS ${NVTX_LIBRARIES})
+ add_definitions(-DMXNET_USE_NVTX=1)


can you add support in makefile too?

eric-haibin-lin

LGTM pending @szha 's comment

eric-haibin-lin · 2019-05-09T22:17:32Z

@KellenSunderland any update? look forward to this feature

These extensions mark readable ranges in the NVIDIA Visual Profiler which helps show correlations between kernel launches and graph node executions. Example shown here: https://user-images.githubusercontent.com/7443219/33946110-34296d18-e021-11e7-8d18-6d40b797405c.png The additional information enabled is in the 'Markers and Ranges' row.

This commit removes NVTX headers from the Amalgamation build process, but this is a CUDA/CMake only feature, so it's not relevant to Amalagamation builds.

KellenSunderland · 2019-05-10T06:49:28Z

Thanks @eric-haibin-lin. Updated with a Makefile build. Sorry it took a while, just wanted to make sure I tested it well with that build method.

szha · 2019-05-12T04:28:14Z

make/config.mk

@@ -80,6 +80,9 @@ ENABLE_CUDA_RTC = 1
 # whether use CuDNN R3 library
 USE_CUDNN = 0

+# whether to use NVTX when profiling
+USE_NVTX = 0


@KellenSunderland do you recommend enabling this in pip?

I did measurements with a few models and didn't see any performance deltas. I think it would be safe to enable in pip.

Yes, as long as you are not running under nvprof nvtx calls are basically noops.

* [MXNET-857] Enable CUDA NVTX extensions for profiler These extensions mark readable ranges in the NVIDIA Visual Profiler which helps show correlations between kernel launches and graph node executions. Example shown here: https://user-images.githubusercontent.com/7443219/33946110-34296d18-e021-11e7-8d18-6d40b797405c.png The additional information enabled is in the 'Markers and Ranges' row. * [MXNET-857] Add initial NVTX profiler implementation This commit removes NVTX headers from the Amalgamation build process, but this is a CUDA/CMake only feature, so it's not relevant to Amalagamation builds. * [MXNET-857] Use macro for NVTX specific code * [MXNET-857] Add integration test. * Turn on NVTX by default in Unix. * Fixed typos and added NTVX info to profiler.md * Add NVTX example to profiling tutorial * Add NVTX flags for make

KellenSunderland requested review from anirudh2290 and szha as code owners August 24, 2018 09:45

KellenSunderland force-pushed the nvtx_initial_merge branch 3 times, most recently from ce4b518 to 60a5e52 Compare August 24, 2018 11:22

KellenSunderland commented Aug 24, 2018

View reviewed changes

marcoabreu added the pr-work-in-progress PR is still work in progress label Sep 18, 2018

marcoabreu added the pr-awaiting-testing PR is reviewed and waiting CI build and test label Nov 28, 2018

sandeep-krishnamurthy added Profiler MXNet profiling issues and removed pr-awaiting-testing PR is reviewed and waiting CI build and test labels Dec 26, 2018

KellenSunderland force-pushed the nvtx_initial_merge branch from b3f48b5 to 53865e1 Compare January 22, 2019 17:41

marcoabreu added the pr-awaiting-review PR is waiting for code review label Mar 19, 2019

KellenSunderland force-pushed the nvtx_initial_merge branch 2 times, most recently from 72a740e to 379e2c8 Compare April 19, 2019 04:55

KellenSunderland changed the title ~~[WIP] [MXNET-857] Add initial NVTX profiler implementation~~ [MXNET-857] Add initial NVTX profiler implementation Apr 19, 2019

KellenSunderland added pr-awaiting-review PR is waiting for code review CUDA labels Apr 19, 2019

KellenSunderland mentioned this pull request Apr 21, 2019

[Discussion] 1.5.0 Roadmap #14619

Closed

eric-haibin-lin reviewed Apr 22, 2019

View reviewed changes

tests/python/profiling/simple_forward.py Show resolved Hide resolved

KellenSunderland force-pushed the nvtx_initial_merge branch 2 times, most recently from b43e422 to 6d96c14 Compare April 27, 2019 08:12

szha reviewed May 1, 2019

View reviewed changes

eric-haibin-lin approved these changes May 3, 2019

View reviewed changes

KellenSunderland and others added 7 commits May 9, 2019 22:54

[MXNET-857] Add initial NVTX profiler implementation

1f99954

This commit removes NVTX headers from the Amalgamation build process, but this is a CUDA/CMake only feature, so it's not relevant to Amalagamation builds.

[MXNET-857] Use macro for NVTX specific code

9f3d011

[MXNET-857] Add integration test.

9b27cac

Turn on NVTX by default in Unix.

cf75170

Fixed typos and added NTVX info to profiler.md

a53ecf4

Add NVTX example to profiling tutorial

4ad4cba

KellenSunderland force-pushed the nvtx_initial_merge branch from 6d96c14 to 4ad4cba Compare May 10, 2019 05:54

Add NVTX flags for make

5f3399c

KellenSunderland merged commit b22ee95 into apache:master May 11, 2019

szha reviewed May 12, 2019

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MXNET-857] Add initial NVTX profiler implementation #12328

[MXNET-857] Add initial NVTX profiler implementation #12328

KellenSunderland commented Aug 24, 2018 •

edited

Loading

KellenSunderland Aug 24, 2018

KellenSunderland Apr 17, 2019

ptrendx commented Aug 24, 2018

KellenSunderland commented Aug 24, 2018 via email

stu1130 commented Sep 18, 2018

vandanavk commented Sep 27, 2018

KellenSunderland commented Sep 28, 2018

Roshrini commented Oct 18, 2018

KellenSunderland commented Oct 18, 2018

lupesko commented Oct 29, 2018

KellenSunderland commented Oct 29, 2018

anirudhacharya commented Nov 13, 2018

stu1130 commented Nov 21, 2018

vandanavk commented Nov 28, 2018

roywei commented Dec 11, 2018

sandeep-krishnamurthy commented Dec 26, 2018

stu1130 commented Jan 15, 2019

KellenSunderland commented Jan 15, 2019

pinaraws commented Mar 19, 2019

pinaraws commented Mar 19, 2019

KellenSunderland commented Apr 18, 2019 •

edited

Loading

KellenSunderland commented Apr 19, 2019

eric-haibin-lin commented May 1, 2019

KellenSunderland commented May 1, 2019

KellenSunderland commented May 1, 2019

szha May 1, 2019

KellenSunderland May 1, 2019

eric-haibin-lin left a comment

eric-haibin-lin commented May 9, 2019

KellenSunderland commented May 10, 2019

szha May 12, 2019

KellenSunderland May 15, 2019

ptrendx May 16, 2019

[MXNET-857] Add initial NVTX profiler implementation #12328

[MXNET-857] Add initial NVTX profiler implementation #12328

Conversation

KellenSunderland commented Aug 24, 2018 • edited Loading

Description

Checklist

Essentials

KellenSunderland Aug 24, 2018

Choose a reason for hiding this comment

KellenSunderland Apr 17, 2019

Choose a reason for hiding this comment

ptrendx commented Aug 24, 2018

KellenSunderland commented Aug 24, 2018 via email

stu1130 commented Sep 18, 2018

vandanavk commented Sep 27, 2018

KellenSunderland commented Sep 28, 2018

Roshrini commented Oct 18, 2018

KellenSunderland commented Oct 18, 2018

lupesko commented Oct 29, 2018

KellenSunderland commented Oct 29, 2018

anirudhacharya commented Nov 13, 2018

stu1130 commented Nov 21, 2018

vandanavk commented Nov 28, 2018

roywei commented Dec 11, 2018

sandeep-krishnamurthy commented Dec 26, 2018

stu1130 commented Jan 15, 2019

KellenSunderland commented Jan 15, 2019

pinaraws commented Mar 19, 2019

pinaraws commented Mar 19, 2019

KellenSunderland commented Apr 18, 2019 • edited Loading

KellenSunderland commented Apr 19, 2019

eric-haibin-lin commented May 1, 2019

KellenSunderland commented May 1, 2019

KellenSunderland commented May 1, 2019

szha May 1, 2019

Choose a reason for hiding this comment

KellenSunderland May 1, 2019

Choose a reason for hiding this comment

eric-haibin-lin left a comment

Choose a reason for hiding this comment

eric-haibin-lin commented May 9, 2019

KellenSunderland commented May 10, 2019

szha May 12, 2019

Choose a reason for hiding this comment

KellenSunderland May 15, 2019

Choose a reason for hiding this comment

ptrendx May 16, 2019

Choose a reason for hiding this comment

KellenSunderland commented Aug 24, 2018 •

edited

Loading

KellenSunderland commented Apr 18, 2019 •

edited

Loading