-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Fix InferAttr/InferShapeAttr not calling inference for all nodes in a graph #16836
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow ! Shocking :)
@junrushao1994 The failure in |
@reminisce The failure in |
@samskalicky @apeforest FYI, see last 2 comments. |
@reminisce Actually, the required change to |
At the time I wrote the control flow code, it is still infeasible to have int64 output for many operators, like |
Not sure I understand @junrushao1994 - I believe the problem here is that you are forcing the float32 output type here even if the operator could output something else (and in the case of the test that fails in this PR, previously the operator did not have a chance to verify that it is correct). Why not leave the output type of the condition part as |
@ptrendx I think leaving it -1 is a nice proposal. Could you fix this? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making the attribute inference logic more robust. Nice catch! LGTM.
… graph (apache#16836) * Fix the attribute inference omitting nodes * Add test * Cleaning * Fix lint * Fix TransposeShape * Fix WhileLoopType * Changing a/b test for fusion to a/(b+1) to increase numerical stability
* Add unoptimized symbol to executor for sharing (#16798) * Add unoptimized symbol to executor for sharing * Copy the symbol in Reshape * Added test for multiple reshapes * Mixed precison binary op backward (use in) for numpy (#16791) * mixed precison binary op backward * reduce unix cpu runtime * USE_NVRTC -> ENABLE_CUDA_RTC to fix maven build. Add compile-guard to fusion. (#16838) * Rename USE_NVRTC -> ENABLE_CUDA_RTC to fix maven build. Compile-guard fusion framework. * Fix fusion-not-supported warning. * Fix compile guards * Fix cmake build so -DMXNET_ENABLE_CUDA_RTC=1 is passed to nvcc * Minimize side-effects of prev change * Fix InferAttr/InferShapeAttr not calling inference for all nodes in a graph (#16836) * Fix the attribute inference omitting nodes * Add test * Cleaning * Fix lint * Fix TransposeShape * Fix WhileLoopType * Changing a/b test for fusion to a/(b+1) to increase numerical stability * Revert "Mixed precison binary op backward (use in) for numpy (#16791)" This reverts commit 8b58b78.
Description
Currently type/shape inference of the operator is only called if the values of input and output attributes are not currently known. This is a bug as it may lead to some operators never seeing the values of those inputs/outputs attributes during InferAttr pass and so never validating them.
As an example, this code snippet runs without throwing any exception in current MXNet:
and it errors out only when calling
e.forward
(in 1.5.1, in 1.6.0 new 2D transpose implementation removed the check for shape inside the forward function of the operator so it does not error out even then). This is because thetranspose
node is early in the topologically sorted graph and ran its shape inference earlier (on unknown input and output shapes) than the_Plus
operators, which then set all of the shapes by themselves, which preventedtranspose
from running its shape inference again to spot that they are not suitable for this operator.In this PR, I track which operators are actually finished inferring attributes (including validating the attributes they got if all of the inputs/outputs are already known at the point of running their attribute inference).
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Comments
true
even if not everything was inferred correctly. This should be tackled in a separate PR.