Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: [NewOptimizer] The one SROA pass to rule them all #26778

Closed
wants to merge 1 commit into from
Closed

Conversation

Keno
Copy link
Member

@Keno Keno commented Apr 11, 2018

This is more of a "where this is going" PR. I hope we won't need to finish this until after we tag 0.7 and have some time to clean it up. The current getfield elim pass in the new optimizer is good enough to handle everything the old optimizer did as well as most simple cases of the new iteration protocol (basically where the variable is checked against nothing right after the phi). However, there are cases that are more complicated which the current getfield elim pass does not handle. To demonstrate that that is just a temporary limitation, this branch implements the more advanced version of that pass, in a sufficient state to do some benchmarking on it (but probably gets corner cases wrong, so a bunch more work is required to fix and cleanup). Here's a motivating example (due to @JeffBezanson). On current master:

using Base.Iterators: product
@noinline dosomething(x) = x
function mycount(itr)
    n = 0
    for x in itr
        dosomething(x)
        n+=1
    end
    return n
end
iter = product(ntuple(n->("a","b","c","d"), Val(9))...);
@benchmark mycount($iter)
BenchmarkTools.Trial:
  memory estimate:  59.29 MiB
  allocs estimate:  786438

With this PR and the new optimizer enabled:

BenchmarkTools.Trial:
  memory estimate:  20.00 MiB
  allocs estimate:  262145

(Timings are faster as well, but I ran them on different machines, so I omitted them since they're not useful comparisons).

The loop has 262144 iterations, so this demonstrates that the only object that gets allocated is the tuple that is required to be allocated, while the current master allocates more than 500000 useless intermediate objects (and of course if the dosomething actually was inlineable and did something with the string, the allocation of the tuple would likely go away as well on this branch).

@Keno
Copy link
Member Author

Keno commented Apr 11, 2018

code_typed on master: https://gist.github.com/Keno/f1146fd6c7f8688cb00dd419f48782ab
code_typed on this branch: https://gist.github.com/Keno/0422fe1d5973265b79bd9c905f36198f

It's fun to count the number of :news and Core.tuple calls that got eliminated.

@ararslan ararslan added the compiler:optimizer Optimization passes (mostly in base/compiler/ssair/) label Apr 11, 2018
Keno added a commit that referenced this pull request Apr 27, 2018
In #26778, I changed the way that the reverse affinity flag works
in order to support node insertion during compaction. In particular,
I had originally (for ease of implementation primarily) had that flag
simply change which BB an inserted node belongs to (i.e. if that flag
was set and there was a basic block boundary at the insertion site,
the new node would belong to the previous BB). In #26778, I changed
this to instead be a flag of whether to insert before or after a given
instruction. As it turns out, this change is also required for correctness.
Because domsorting can change the notion of "previous instruction" by
moving basic blocks, we had (in rare circumstances) instructions end
up in the wrong BB. Backporting those changes fixes that.
Keno added a commit that referenced this pull request Apr 27, 2018
In #26778, I changed the way that the reverse affinity flag works
in order to support node insertion during compaction. In particular,
I had originally (for ease of implementation primarily) had that flag
simply change which BB an inserted node belongs to (i.e. if that flag
was set and there was a basic block boundary at the insertion site,
the new node would belong to the previous BB). In #26778, I changed
this to instead be a flag of whether to insert before or after a given
instruction. As it turns out, this change is also required for correctness.
Because domsorting can change the notion of "previous instruction" by
moving basic blocks, we had (in rare circumstances) instructions end
up in the wrong BB. Backporting those changes fixes that.
Keno added a commit that referenced this pull request Apr 27, 2018
In #26778, I changed the way that the reverse affinity flag works
in order to support node insertion during compaction. In particular,
I had originally (for ease of implementation primarily) had that flag
simply change which BB an inserted node belongs to (i.e. if that flag
was set and there was a basic block boundary at the insertion site,
the new node would belong to the previous BB). In #26778, I changed
this to instead be a flag of whether to insert before or after a given
instruction. As it turns out, this change is also required for correctness.
Because domsorting can change the notion of "previous instruction" by
moving basic blocks, we had (in rare circumstances) instructions end
up in the wrong BB. Backporting those changes fixes that.
Keno added a commit that referenced this pull request May 1, 2018
[NewOptimizer] Backport reverse affinity changes from #26778
line::Int
end
function NewNode(nnode::NewNode; pos=nnode.pos, reverse_affinity=nnode.reverse_affinity,
typ=nnode.typ, node=nnode.node, line=nnode.line)
Copy link
Sponsor Member

@vtjnash vtjnash May 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't use named tuples for this – performance and compiler correctness will be really bad

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already fixed on master - I just haven't rebased this change since.

function getindex(compact::IncrementalCompact, ssa::NewSSAValue)
return compact.new_new_nodes[ssa.id].node
end

function count_added_node!(compact, v)
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for performance, add type-sig annotations


function resort_pending!(compact)
sort!(compact.pending_perm, DEFAULT_STABLE, Order.By(x->compact.pending_nodes[x].pos))
end
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this returning a value?

sort!(compact.pending_perm, DEFAULT_STABLE, Order.By(x->compact.pending_nodes[x].pos))
end

function insert_node!(compact::IncrementalCompact, before, @nospecialize(typ), @nospecialize(val), reverse_affinity::Bool=false)
Copy link
Sponsor Member

@vtjnash vtjnash May 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what type is before - looks like this should written to use dispatch?

entry = compact.pending_nodes[pos - length(compact.ir.stmts) - length(compact.ir.new_nodes)]
pos, reverse_affinity = entry.pos, entry.reverse_affinity
end
line = 0 #compact.ir.lines[before.id]
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO?

entry.pos == idx && entry.reverse_affinity
end

function process_newnode!(compact, new_idx, new_node_entry, idx, active_bb)
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type signature?

if compact.new_nodes_idx <= length(compact.perm) && compact.ir.new_nodes[compact.perm[compact.new_nodes_idx]][1] == idx
if compact.new_nodes_idx <= length(compact.perm) &&
(entry = compact.ir.new_nodes[compact.perm[compact.new_nodes_idx]];
entry.reverse_affinity ? entry.pos == idx - 1 : entry.pos == idx)
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's not be clever with how much logic we can pack into the conditional

@@ -617,10 +758,12 @@ function next(compact::IncrementalCompact, (idx, active_bb)::Tuple{Int, Int})
return (old_result_idx, compact.result[old_result_idx]), (compact.idx, active_bb)
end

function maybe_erase_unused!(extra_worklist, compact, idx)
effect_free = stmt_effect_free(compact.result[idx], compact, compact.ir.mod)
function maybe_erase_unused!(extra_worklist, compact, idx, callback = x->nothing)
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type signature?

bb = compact.result_bbs[end]
compact.result_bbs[end] = BasicBlock(bb,
StmtRange(first(bb.stmts), result_idx-1))
end
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the BasicBlock return value intended to be meaningful?

end

# Insert PhiNodes
lifted_phis = map(visited_phinodes) do item
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since the front-end doesn't have domtree analysis, wrap this in a let block to explicitly capture variables (to avoid preventing inference from seeing the types of these variables everywhere in the containing function)


function count_uses(stmt, uses)
for ur in userefs(stmt)
if isa(ur[], SSAValue)
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lift ur[] to a variable

end
end

function mark_phi_cycles(compact, safe_phis, phi)
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type sig

changed = true
while changed
changed = false
safe_phis = IdSet{Int}()
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IdSet{Int} are very slow. Use a BitSet.

node = renumber_ssa!(isa(entry.node, PhiNode) ?
rename_phinode_edges(entry.node, 0, result_order, bb_rename) : entry.node,
inst_rename, true)
) for entry in ir.new_nodes]
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoid creating a closure over Box variables here (see elsewhere for how to re-write this)

@Keno
Copy link
Member Author

Keno commented May 7, 2018

As I said on slack, this is not currently in a reviewable state. You're welcome to continue, but you'll likely have to do it over when I pick this up again.

@Keno Keno mentioned this pull request May 13, 2018
18 tasks
Keno added a commit that referenced this pull request May 16, 2018
This is a rebased and fixed version of the improved SROA pass from #26778.
There's a decent piece of new infrastructure wrapped up in this: The ability
to insert new nodes during compaction. This is a bit tricky because it requires
tracking which version of the statements buffer a given SSAValue belongs to.
At the moment this is done mostly manually, but I'm hoping to clean that up
in the future. The idea of the new SROA pass is fairly straightforward:
Given a use of an interesting value, it traces through all phi nodes, finding
all leaves, applies whatever transformation to those leaves and then re-inserts
a phi nest corresponding to the phi nest of the original value.
Keno added a commit that referenced this pull request May 16, 2018
This is a rebased and fixed version of the improved SROA pass from #26778.
There's a decent piece of new infrastructure wrapped up in this: The ability
to insert new nodes during compaction. This is a bit tricky because it requires
tracking which version of the statements buffer a given SSAValue belongs to.
At the moment this is done mostly manually, but I'm hoping to clean that up
in the future. The idea of the new SROA pass is fairly straightforward:
Given a use of an interesting value, it traces through all phi nodes, finding
all leaves, applies whatever transformation to those leaves and then re-inserts
a phi nest corresponding to the phi nest of the original value.
@Keno
Copy link
Member Author

Keno commented May 16, 2018

Superseded by #27126.

@Keno Keno closed this May 16, 2018
@DilumAluthge DilumAluthge deleted the kf/bigsroa branch March 25, 2021 21:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:optimizer Optimization passes (mostly in base/compiler/ssair/)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants