This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Dual stream cudnn Convolution backward() with MXNET_GPU_WORKER_NSTREAMS=2. #14006
Merged
Merged
Changes from all commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
ff759f2
Dual stream conv backward(). Enable with MXNET_GPU_WORKER_NSTREAMS=2.
DickJC123 d52774d
Fix for MSVC compiler.
DickJC123 1cf5c67
Fix cpplint.
DickJC123 e5b57a8
Add MXNET_GPU_WORKER_NSTREAMS env var documentation.
DickJC123 d16e85f
Improve test function and commenting.
DickJC123 b44d096
Merge remote-tracking branch 'mxnet/master' into parallel_conv_dgrad_…
DickJC123 838bda0
Add description of proper aux stream use using events.
DickJC123 c3f3320
RAII rework to simplify usage within operators.
DickJC123 db5075f
Fix cpplint.
DickJC123 ea56c91
Expand testing to cover all engines.
DickJC123 790a998
Fix NaiveEngine shutdown segfault on CentOS7.
DickJC123 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This gives me the impression that nstream may support more than 2 in the future. However, the c interface only returns an aux stream. Do you foresee that more than 1 aux stream is needed to help accelerating parallel computation in the future? If so, do we need to update the interface to reflect that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The short answer is that 'yes', an operator with 3 inputs might make use of 3 streams in Backward(), so I did not want to propose an environment variable name like MXNET_GPU_WORKER_USE_DUAL_STREAM=0/1 that might soon become obsolete. On the other hand, Convolution only needs 2 streams, and I did not want to burden this enhancement with more complexity than is needed at this time. I propose that when we have a use-case for 3 or more streams, then we can expand the implementation and employ the use-case in our testing of it.
At the end of every kernel execution, there is a fall-off in GPU utilization leading up to the completion of the last grid block. When two streams are being used, these utilization gaps can be filled by work from the second stream. I would guess that having 3 streams would not enhance this effect. On the other hand, let's say you had 3 small independent kernels that each would occupy a third of the GPU. You could see how having 3 streams would be a win in this case over 2 streams.
So it's good that you ask, how might we expand this to 3 or more streams? The MXNET_GPU_WORKER_NSTREAMS environment variable would remain unchanged, though the documentation would indicate that the framework supports a value greater than 2. Legacy env-var uses would be preserved so I think this could happen as part of a minor release. At the RunContext level, a GPUAuxStream* would be replaced by a std::vector<GPUAuxStream*>. The RunContext method get_gpu_aux_stream() might then be changed to RunContex::get_gpu_aux_stream(int aux_stream_id = 0), which would not break operator code that started using the simpler aux_stream API proposed by this PR.