-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adce_pass: Drop phinode edges that can be proved unused #43922
Conversation
This commit implements more comparison liftings. Especially, this change enables the compiler to lift `isa`/`isdefined` checks (i.e. replace a comparison call with ϕ-node by CFG union-splitting). For example, the code snippet below will run 500x faster: ```julia function compute(n) s = 0 itr = 1:n st = iterate(itr) while isdefined(st, 2) # mimic our iteration protocol with `isdefined` v, st = st s += v st = iterate(itr, st) end s end ``` Although it seems like the codegen for `isa` is fairly optimized already and so I could not find any performance benefit for `isa`-lifting (`code_llvm` emits mostly equivalent code), but I hope it's more ideal if we can do the equivalent optimization on Julia level so that we can just consult to `code_typed` for performance optimization.
Union splitting introduces patterns like: Consider a case like: ``` f(x::Float64) = println(x) f(x::SomeBigStruct) = nothing ``` ``` goto 2 if not ... 1: a = ::Float64 goto 3 2: b = new(SomeBigStruct, ...)::SomeBigStruct 3: c = phi(a, b) if !isa(c, Float64) goto 5 4: d = PiNode(c, Float64) println(d) goto 6 5: nothing 6: return nothing ``` Now, #43227 will turn this into: ``` goto 2 if not ... 1: a = ::Float64 goto 3 2: b = new(SomeBigStruct, ...)::SomeBigStruct 3: c = phi(a, b) cond = phi(true, false) if !cond goto 5 4: d = PiNode(c, Float64) println(d) goto 6 5: nothing 6: return nothing ``` But even though dynamically `b` is never used, it doesn't get deleted, because there's still a use in the PhiNode `c`. This PR teaches adce to recognize this situation and, for PhiNodes that are only ever used by PiNodes, drop any edges that are known to be unused. E.g. in the above case, the adce improvements would turn it into: ``` goto 2 if not ... 1: a = ::Float64 goto 3 2: b = new(SomeBigStruct, ...)::SomeBigStruct 3: c = phi(a) cond = phi(true, false) if !cond goto 5 4: d = PiNode(c, Float64) println(d) goto 6 5: nothing 6: return nothing ``` Which in turn would let regular DCE drop the allocation: ``` goto 2 if not ... 1: a = ::Float64 goto 3 2: nothing 3: c = phi(a) cond = phi(true, false) if !cond goto 5 4: d = PiNode(c, Float64) println(d) goto 6 5: nothing 6: return nothing ``` Co-authored-by: Ian Atol <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be a good enhancement.
if phi_uses[phi] != 0 | ||
continue | ||
end | ||
if t === Union{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When this condition can be hit? Do we sometimes have π(...)::Union{}
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, or during compact all the uses turned out dead or something. We could do this folding in compact, but since this is the place that changes it, might as well do it here.
base/compiler/ssair/passes.jl
Outdated
end | ||
end | ||
end | ||
non_dce_finish!(compact) | ||
for phi in all_phis | ||
count_uses(compact.result[phi][:inst]::PhiNode, phi_uses) | ||
end | ||
# Narrow any union phi nodes that have unused branches | ||
for (phi, t) in zip(unionphis, unionphi_types) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember @vtjnash said that this (phi, t)
may allocate arbitrary tuple types depending on user types fed into unionphi_types
:
for (phi, t) in zip(unionphis, unionphi_types) | |
@assert length(unionphis) == length(unionphi_types) | |
for i = length(unionphis) | |
phi = unionphis[i] | |
t = unionphi_types[i] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right. Good catch.
Co-authored-by: Shuhei Kadowaki <[email protected]>
deleteat!(unionphis, first(r)) | ||
deleteat!(unionphi_types, first(r)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like you should be using a Set
or small (unsorted) array, if you need deletion. Otherwise this object is costing O(n)
LGTM |
Just a slight tweak to #43922 to also allow the implicit narrowing from other phi users. It's not a particularly common occurrence, but can happen particularly when running the adce pass twice, (because that narrows the types of some phis by dropping edges).
Just a slight tweak to #43922 to also allow the implicit narrowing from other phi users. It's not a particularly common occurrence, but can happen particularly when running the adce pass twice, (because that narrows the types of some phis by dropping edges).
Just a slight tweak to #43922 to also allow the implicit narrowing from other phi users. It's not a particularly common occurrence, but can happen particularly when running the adce pass twice, (because that narrows the types of some phis by dropping edges).
) * optimizer: lift more comparisons This commit implements more comparison liftings. Especially, this change enables the compiler to lift `isa`/`isdefined` checks (i.e. replace a comparison call with ϕ-node by CFG union-splitting). For example, the code snippet below will run 500x faster: ```julia function compute(n) s = 0 itr = 1:n st = iterate(itr) while isdefined(st, 2) # mimic our iteration protocol with `isdefined` v, st = st s += v st = iterate(itr, st) end s end ``` Although it seems like the codegen for `isa` is fairly optimized already and so I could not find any performance benefit for `isa`-lifting (`code_llvm` emits mostly equivalent code), but I hope it's more ideal if we can do the equivalent optimization on Julia level so that we can just consult to `code_typed` for performance optimization. * adce_pass: Drop phinode uses that can be proved unused Union splitting introduces patterns like: Consider a case like: ``` f(x::Float64) = println(x) f(x::SomeBigStruct) = nothing ``` ``` goto 2 if not ... 1: a = ::Float64 goto 3 2: b = new(SomeBigStruct, ...)::SomeBigStruct 3: c = phi(a, b) if !isa(c, Float64) goto 5 4: d = PiNode(c, Float64) println(d) goto 6 5: nothing 6: return nothing ``` Now, JuliaLang#43227 will turn this into: ``` goto 2 if not ... 1: a = ::Float64 goto 3 2: b = new(SomeBigStruct, ...)::SomeBigStruct 3: c = phi(a, b) cond = phi(true, false) if !cond goto 5 4: d = PiNode(c, Float64) println(d) goto 6 5: nothing 6: return nothing ``` But even though dynamically `b` is never used, it doesn't get deleted, because there's still a use in the PhiNode `c`. This PR teaches adce to recognize this situation and, for PhiNodes that are only ever used by PiNodes, drop any edges that are known to be unused. E.g. in the above case, the adce improvements would turn it into: ``` goto 2 if not ... 1: a = ::Float64 goto 3 2: b = new(SomeBigStruct, ...)::SomeBigStruct 3: c = phi(a) cond = phi(true, false) if !cond goto 5 4: d = PiNode(c, Float64) println(d) goto 6 5: nothing 6: return nothing ``` Which in turn would let regular DCE drop the allocation: ``` goto 2 if not ... 1: a = ::Float64 goto 3 2: nothing 3: c = phi(a) cond = phi(true, false) if !cond goto 5 4: d = PiNode(c, Float64) println(d) goto 6 5: nothing 6: return nothing ``` Co-authored-by: Shuhei Kadowaki <[email protected]> Co-authored-by: Ian Atol <[email protected]>
Just a slight tweak to JuliaLang#43922 to also allow the implicit narrowing from other phi users. It's not a particularly common occurrence, but can happen particularly when running the adce pass twice, (because that narrows the types of some phis by dropping edges).
) * optimizer: lift more comparisons This commit implements more comparison liftings. Especially, this change enables the compiler to lift `isa`/`isdefined` checks (i.e. replace a comparison call with ϕ-node by CFG union-splitting). For example, the code snippet below will run 500x faster: ```julia function compute(n) s = 0 itr = 1:n st = iterate(itr) while isdefined(st, 2) # mimic our iteration protocol with `isdefined` v, st = st s += v st = iterate(itr, st) end s end ``` Although it seems like the codegen for `isa` is fairly optimized already and so I could not find any performance benefit for `isa`-lifting (`code_llvm` emits mostly equivalent code), but I hope it's more ideal if we can do the equivalent optimization on Julia level so that we can just consult to `code_typed` for performance optimization. * adce_pass: Drop phinode uses that can be proved unused Union splitting introduces patterns like: Consider a case like: ``` f(x::Float64) = println(x) f(x::SomeBigStruct) = nothing ``` ``` goto 2 if not ... 1: a = ::Float64 goto 3 2: b = new(SomeBigStruct, ...)::SomeBigStruct 3: c = phi(a, b) if !isa(c, Float64) goto 5 4: d = PiNode(c, Float64) println(d) goto 6 5: nothing 6: return nothing ``` Now, JuliaLang#43227 will turn this into: ``` goto 2 if not ... 1: a = ::Float64 goto 3 2: b = new(SomeBigStruct, ...)::SomeBigStruct 3: c = phi(a, b) cond = phi(true, false) if !cond goto 5 4: d = PiNode(c, Float64) println(d) goto 6 5: nothing 6: return nothing ``` But even though dynamically `b` is never used, it doesn't get deleted, because there's still a use in the PhiNode `c`. This PR teaches adce to recognize this situation and, for PhiNodes that are only ever used by PiNodes, drop any edges that are known to be unused. E.g. in the above case, the adce improvements would turn it into: ``` goto 2 if not ... 1: a = ::Float64 goto 3 2: b = new(SomeBigStruct, ...)::SomeBigStruct 3: c = phi(a) cond = phi(true, false) if !cond goto 5 4: d = PiNode(c, Float64) println(d) goto 6 5: nothing 6: return nothing ``` Which in turn would let regular DCE drop the allocation: ``` goto 2 if not ... 1: a = ::Float64 goto 3 2: nothing 3: c = phi(a) cond = phi(true, false) if !cond goto 5 4: d = PiNode(c, Float64) println(d) goto 6 5: nothing 6: return nothing ``` Co-authored-by: Shuhei Kadowaki <[email protected]> Co-authored-by: Ian Atol <[email protected]>
Just a slight tweak to JuliaLang#43922 to also allow the implicit narrowing from other phi users. It's not a particularly common occurrence, but can happen particularly when running the adce pass twice, (because that narrows the types of some phis by dropping edges).
Review note: Base branch is #43227, since that's needed for the test to pass. I'd suggest merging that first, which will automatically update the base branch.
Consider a case like:
Union splitting introduces code that looks like:
Now, #43227 will turn this into:
But even though dynamically
b
is never used,it doesn't get deleted, because there's still a
use in the PhiNode
c
.This PR teaches adce to recognize this situation and,
for PhiNodes that are only ever used by PiNodes, drop
any edges that are known to be unused. E.g. in the above
case, the adce improvements would turn it into:
Which in turn would let regular DCE drop the allocation:
Co-authored-by: Ian Atol [email protected]