This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
[BUGFIX] Reenable fwd conv engine 5 on test_group_conv2d_16c #21104
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
PR #20635, which began using the cuDNN v8 backend API for Convolution ops, includes the following line to avoid
test_gluon_gpu.py::test_group_conv2d_16c
failures that began occurring coincident with the PR:This PR will remove that line by providing a different implementation of the "convolution plan cache" introduced with PR #20635 that is compatible with convolution engine 5. The steps of this PR will be:
test_group_conv2d_16c
failures, thentest_group_conv2d_16c
passes even with engine 5 useFurther detail:
The cuDNN v8 backend allows one to bypass a lot of CPU processing that might precede kernel launch by first building up and finalizing a convolution execution plan. The plan, which includes a choice of convolution engine and 'knob settings', is then executed efficiently by the call
cudnnBackendExecute(cudnn_handle, plan, ...)
. PR #20635 introduced a cache of plans so that autotuning does not need to be repeated for identically-parameterized convolutions, which are then handled by the same plan even if they exist multiple times in a model or are handled by different GPU workers.The issue that was discovered for convolution engine 5 is that it caches a cuDNN handle provided during the plan's construction, and does not consider the handle passed as an argument of cudnnBackendExecute(). The result is that the engine's kernels are launched into the stream of the cached handle, and this would be the incorrect stream if the GPU worker executing the plan is different from the one that created the plan. Without the proper stream synchronization, incorrect results may follow.
The contribution of this PR is to effectively include a GPU worker's cudnn handle as part of the key used in the cache lookup. One aspect of the fix though is that if there's a plan cache miss, a plan made by a different worker for the same convolution can be 'cloned' with the proper handle without repeating the autotuning.
Checklist
Essentials