-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C++: Optimize SubBasicBlocks library #1824
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
pragma[nomagic] | ||
private int outerPosToInnerPos(BasicBlock bb, int posInBB) { | ||
posInBB = result + this.getIndexInBasicBlock(bb) and | ||
result = [ 0 .. this.getNumberOfNodes() - 1 ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How confident are we that this isn't quadratic? Naively, it seems that for any particular node bb, posInBB
, every possible this
in bb
must be checked.
Other than this question, FWIW, these changes LGTM also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the DIL we generate for this predicate:
noinline nomagic
SubBasicBlocks::SubBasicBlock::outerPosToInnerPos#ffff(/* SubBasicBlocks::SubBasicBlock */ cached entity this,
/* BasicBlocks::BasicBlock */ cached entity bb,
int posInBB, int result) :-
exists(cached int call_result |
exists(int range#high |
exists(int Sub#lhs |
SubBasicBlocks::SubBasicBlock::getIndexInBasicBlock#fff(this, bb,
call_result),
SubBasicBlocks::SubBasicBlock::getNumberOfNodes#ff(this, Sub#lhs),
Minus(Sub#lhs, 1, range#high)
),
range(0, range#high, result)
),
PlusInt(result, call_result, posInBB)
)
.
It first joins in the two getters (functional relations), so the tuple count will be the number of SubBasicBlock
. It then joins with the range, which multiplies that tuple count by the average number of nodes per SubBasicBlock
. All together, this predicate should have the same number of tuples as there are ControlFlowNode
s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, that gives me a bit more confidence.
Some predicates in this library had performance that was quadratic in the number of sub-basic-blocks per basic block. After field flow was added, that has become a problem on two projects: intel/mkl-dnn and laurentkneip/opengv. This PR should fix the join order and make these predicates faster on all projects.