Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Use NVTX filtering to limit NCU profile collection
Summary: Previously, we used `--replay-mode range`, but that did not give us per-kernel metrics, so it was changed to `---replay-mode kernel` (the default). However, that can causes us to profile a lot more kernels outside the ones in the desired benchmark. It appears we can instead use NVTX filtering to solve this problem. Relevant docs: https://docs.nvidia.com/nsight-compute/NsightComputeCli/index.html#nvtx-filtering I also tacked on a minor change to the ncu invocation, adding `--import-source yes`. This makes it easier to analyze the traces on a different machine from the one doing the profiling. Reviewed By: chenyang78 Differential Revision: D58711358 fbshipit-source-id: 28aec4f71a736c7427b1886335297ece4a2a54a8
- Loading branch information