Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[MXNET-857] Add initial NVTX profiler implementation #12328

Merged
merged 8 commits into from
May 11, 2019

Conversation

KellenSunderland
Copy link
Contributor

@KellenSunderland KellenSunderland commented Aug 24, 2018

Description

This PR builds on-top of current profiler support to allow profiling via NVIDIA's NVTX APIs. These extensions mark readable ranges in the NVIDIA Visual Profiler which helps show correlations between kernel launches and graph node executions.

Example shown here: https://user-images.githubusercontent.com/7443219/33946110-34296d18-e021-11e7-8d18-6d40b797405c.png (The additional information enabled is in the 'Markers and Ranges' row.)

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http:https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this changs

Notes:

  • TODO: Integration test

print(sys.executable)

print(os.path.realpath(__file__))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: launch subprocess that profiles the simple forward pass of network py. Verify that ranges are collected properly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Todo done. Testing in CI to make sure it correctly detects ranges.

@ptrendx
Copy link
Member

ptrendx commented Aug 24, 2018

One thing that would make the profile more readable is different colors for different ranges. Here is an example how to do this: https://github.com/NVIDIA/DALI/blob/master/dali/common.h#L149

You could for example use the first letter of the range as a very simple hash to the list of predetermined colors (don't do anything more fancy like actual hashing of the name since that would only introduce overhead which you don't want during profiling).

@KellenSunderland
Copy link
Contributor Author

KellenSunderland commented Aug 24, 2018 via email

@stu1130
Copy link
Contributor

stu1130 commented Sep 18, 2018

@mxnet-label-bot [pr-work-in-progress]

@marcoabreu marcoabreu added the pr-work-in-progress PR is still work in progress label Sep 18, 2018
@vandanavk
Copy link
Contributor

@KellenSunderland Is the PR still WIP?

@KellenSunderland
Copy link
Contributor Author

@vandanavk Yes, I'll loop around to this once I finish some other tasks. Does it cause problems for your team to leave it in a WIP state? For me it's quite handy to be able to have WIP PRs that I can put aside and resume later.

@Roshrini
Copy link
Member

@KellenSunderland It doesn't cause problems for our team. But we do try to go through all the stale PRs in the repo to make sure someone is working on it and PRs are getting timely reviewed.

@KellenSunderland
Copy link
Contributor Author

Ok still planning to loop around on this after checking out some code from another implementation to see if there's any lessons to learn there.

@lupesko
Copy link
Contributor

lupesko commented Oct 29, 2018

@KellenSunderland thanks for the contribution! Any updates on progress/ETA?

@KellenSunderland
Copy link
Contributor Author

Hey @lupesko. Sorry still in my backlog. I've reviewed a few suggestions so no longer blocked on anyone. I just need to find the time to finish this off.

@anirudhacharya
Copy link
Member

@KellenSunderland pinging for update.

@stu1130
Copy link
Contributor

stu1130 commented Nov 21, 2018

@KellenSunderland ping
feel free to close the PR and reopen it if you have time :)

@vandanavk
Copy link
Contributor

@mxnet-label-bot add [pr-awaiting-testing]

@marcoabreu marcoabreu added the pr-awaiting-testing PR is reviewed and waiting CI build and test label Nov 28, 2018
@roywei
Copy link
Member

roywei commented Dec 11, 2018

@KellenSunderland ping for update and trigger ci. thanks!

@sandeep-krishnamurthy sandeep-krishnamurthy added Profiler MXNet profiling issues and removed pr-awaiting-testing PR is reviewed and waiting CI build and test labels Dec 26, 2018
@sandeep-krishnamurthy
Copy link
Contributor

This will be very helpful addition to profiler utility. @KellenSunderland - looking forward for the update.

@stu1130
Copy link
Contributor

stu1130 commented Jan 15, 2019

@KellenSunderland ping for the update. Thanks

@KellenSunderland
Copy link
Contributor Author

@stu1130 @sandeep-krishnamurthy @roywei @anirudh2290 @lupesko
I'm about to resume work to finish this PR off but I'm focusing on merging these two PRs first #13310 and #13311. Would really appreciate it if I could get some help with a review by tomorrow so I can get these into 1.4.x in time for a code freeze.

@pinaraws
Copy link

@mxnet-label-bot add[pr-awaiting-review]

@marcoabreu marcoabreu added the pr-awaiting-review PR is waiting for code review label Mar 19, 2019
@pinaraws
Copy link

@mxnet-label-bot remove[pr-work-in-progress]

@KellenSunderland
Copy link
Contributor Author

KellenSunderland commented Apr 18, 2019

Still making progress on this. I believe the last TODO is either use static linking or to make sure the nvExt library is copied to the lib path for testing on windows. It's probably best for the customer to use static linking, so I'll try to implement that.

Edit: I think windows support is going to be problematic. If we reference NVTX in Windows by default NVIDIA recommends shipping a the dll with the application:

5.4	Deploying NVTX
The NVTX .dll is not installed into c:\Windows\System32 or another global location. Instead, make sure to deploy the .dll with your application. 

Ref: https://docs.nvidia.com/gameworks/content/developertools/desktop/nsight/nvtx_library.htm

I think this would be more effort than it's worth for 99% of our Windows users. On Linux the library is installed and on the path for everyone with Cuda installed, so I don't have concerns turning NVTX on by default for profiling on that platform.

@KellenSunderland KellenSunderland force-pushed the nvtx_initial_merge branch 2 times, most recently from 72a740e to 379e2c8 Compare April 19, 2019 04:55
@KellenSunderland KellenSunderland changed the title [WIP] [MXNET-857] Add initial NVTX profiler implementation [MXNET-857] Add initial NVTX profiler implementation Apr 19, 2019
@KellenSunderland KellenSunderland added pr-awaiting-review PR is waiting for code review CUDA labels Apr 19, 2019
@KellenSunderland
Copy link
Contributor Author

@sandeep-krishnamurthy ready for review if you've got a second.

@KellenSunderland KellenSunderland force-pushed the nvtx_initial_merge branch 2 times, most recently from b43e422 to 6d96c14 Compare April 27, 2019 08:12
@eric-haibin-lin
Copy link
Member

LGTM. Will this be enabled by default (e.g. via pip)?

@KellenSunderland
Copy link
Contributor Author

Yes, it'll be on by default for CUDA builds.

@KellenSunderland
Copy link
Contributor Author

@eric-haibin-lin Any other questions about this PR? I've done a bunch of perf tests offline and don't see any regressions, but don't mind doing some additional investigations if you'd like. If you're ok with the change would you be able to set this PR to approved, I'm hesitant to merge without green checkboxes :-).

if(NVTX_FOUND)
include_directories(${NVTX_INCLUDE_DIRS})
list(APPEND mxnet_LINKER_LIBS ${NVTX_LIBRARIES})
add_definitions(-DMXNET_USE_NVTX=1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add support in makefile too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can do.

Copy link
Member

@eric-haibin-lin eric-haibin-lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending @szha 's comment

@eric-haibin-lin
Copy link
Member

@KellenSunderland any update? look forward to this feature

KellenSunderland and others added 7 commits May 9, 2019 22:54
These extensions mark readable ranges in the NVIDIA Visual Profiler which helps
show correlations between kernel launches and graph node executions.

Example shown here: https://user-images.githubusercontent.com/7443219/33946110-34296d18-e021-11e7-8d18-6d40b797405c.png
The additional information enabled is in the 'Markers and Ranges' row.
This commit removes NVTX headers from the Amalgamation build process,
but this is a CUDA/CMake only feature, so it's not relevant to
Amalagamation builds.
@KellenSunderland
Copy link
Contributor Author

Thanks @eric-haibin-lin. Updated with a Makefile build. Sorry it took a while, just wanted to make sure I tested it well with that build method.

@KellenSunderland KellenSunderland merged commit b22ee95 into apache:master May 11, 2019
@@ -80,6 +80,9 @@ ENABLE_CUDA_RTC = 1
# whether use CuDNN R3 library
USE_CUDNN = 0

# whether to use NVTX when profiling
USE_NVTX = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KellenSunderland do you recommend enabling this in pip?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did measurements with a few models and didn't see any performance deltas. I think it would be safe to enable in pip.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as long as you are not running under nvprof nvtx calls are basically noops.

haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019
* [MXNET-857] Enable CUDA NVTX extensions for profiler

These extensions mark readable ranges in the NVIDIA Visual Profiler which helps
show correlations between kernel launches and graph node executions.

Example shown here: https://user-images.githubusercontent.com/7443219/33946110-34296d18-e021-11e7-8d18-6d40b797405c.png
The additional information enabled is in the 'Markers and Ranges' row.

* [MXNET-857] Add initial NVTX profiler implementation

This commit removes NVTX headers from the Amalgamation build process,
but this is a CUDA/CMake only feature, so it's not relevant to
Amalagamation builds.

* [MXNET-857] Use macro for NVTX specific code

* [MXNET-857] Add integration test.

* Turn on NVTX by default in Unix.

* Fixed typos and added NTVX info to profiler.md

* Add NVTX example to profiling tutorial

* Add NVTX flags for make
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
CUDA pr-awaiting-review PR is waiting for code review Profiler MXNet profiling issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet