Fix performance problem in LLVM JumpThreading #25630

Keno · 2018-01-18T21:45:40Z

This patch is https://reviews.llvm.org/D42262 upstream. When we emit
unboxed unions, we tend to emit switch instructions on a small
integer which serves as a marker for which of types is currently active,
so we often emit things like:

block:
    ; Incoming from a union split branch
    %phi = i8 phi [1, %a], [2, %b]
    < Some other operation not on the union split>
    ; Union split again
    switch i8 %phi, label %boxed [
        i8 1, %abranch
        i8 2, %bbranch
    ]

In many situations the operation in the middle can get optimized away,
so we want to merge the two union split sections into one. LLVM's
jump threading pass can do this by keeping track of how control flow
behaves across a given basic block. Unfortunately, this optimization
wasn't taking effect. This is because InstCombine realized that the
range of possible values was rather small and turned the above into
something like:

   %trunc = truc i8 %phi to i2
   switch i2 %trunc, label %boxed [
       i2 1, %abranch
       i2 -2, %bbranch
   ]

which JumpThreading refused to look through (because of the i2 rather
than the i1). The included patch fixes this. On recent LLVM, we
additionally need https://reviews.llvm.org/D42260, for cases where
a union split branch happens to target a loop header. However,
LLVM 3.9.1 does not include the original commit that regressed that.

This fixes a number of the performance regressions seen in #25261.

This patch is https://reviews.llvm.org/D42262 upstream. When we emit unboxed unions, we tend to emit `switch` instructions on a small integer which serves as a marker for which of types is currently active, so we often emit things like: ``` block: ; Incoming from a union split branch %phi = i8 phi [1, %a], [2, %b] < Some other operation not on the union split> ; Union split again switch i8 %phi, label %boxed [ i8 1, %abranch i8 2, %bbranch ] ``` In many situations the operation in the middle can get optimized away, so we want to merge the two union split sections into one. LLVM's jump threading pass can do this by keeping track of how control flow behaves across a given basic block. Unfortunately, this optimization wasn't taking effect. This is because InstCombine realized that the range of possible values was rather small and turned the above into something like: ``` %trunc = truc i8 %phi to i2 switch i2 %trunc, label %boxed [ i2 1, %abranch i2 -2, %bbranch ] ``` which JumpThreading refused to look through (because of the i2 rather than the i1). The included patch fixes this. On recent LLVM, we additionally need https://reviews.llvm.org/D42260, for cases where a union split branch happens to target a loop header. However, LLVM 3.9.1 does not include the original commit that regressed that. This fixes a number of the performance regressions seen in #25261.

Keno · 2018-01-18T21:46:22Z

@nanosoldier runbenchmarks(ALL, vs=":master")

nanosoldier · 2018-01-18T22:11:55Z

Something went wrong when running your job:

NanosoldierError: failed to run benchmarks against primary commit: failed process: Process(`sudo cset shield -e su nanosoldier -- -c ./benchscript.sh`, ProcessExited(1)) [1]

Logs and partial data can be found here
cc @ararslan

Keno · 2018-01-18T22:17:31Z

cc @ararslan LinAlg move broke nanosoldier

ararslan · 2018-01-18T22:20:15Z

Ah, so it did. Thanks for the heads up, I'll get on that right away.

ararslan · 2018-01-19T20:10:36Z

@nanosoldier runbenchmarks(ALL, vs=":master")

nanosoldier · 2018-01-19T21:14:56Z

Something went wrong when running your job:

NanosoldierError: failed to run benchmarks against primary commit: failed process: Process(`sudo cset shield -e su nanosoldier -- -c ./benchscript.sh`, ProcessExited(1)) [1]

Logs and partial data can be found here
cc @ararslan

ararslan · 2018-01-19T22:33:32Z

@nanosoldier runbenchmarks(ALL, vs=":master")

nanosoldier · 2018-01-19T23:38:11Z

Something went wrong when running your job:

NanosoldierError: failed to run benchmarks against primary commit: failed process: Process(`sudo cset shield -e su nanosoldier -- -c ./benchscript.sh`, ProcessExited(1)) [1]

Logs and partial data can be found here
cc @ararslan

Keno · 2018-01-19T23:39:13Z

Looks like the Serializer move also broke it.

ararslan · 2018-01-20T00:11:28Z

Thanks, working on that now.

ararslan · 2018-01-20T05:28:02Z

@nanosoldier runbenchmarks(ALL, vs=":master")

nanosoldier · 2018-01-20T11:16:59Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

Keno merged commit 808e828 into master Jan 20, 2018

martinholters deleted the kf/llvmunionsplit branch January 20, 2018 20:34

maleadt mentioned this pull request Jul 4, 2018

Backport LLVM D42691. #27934

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix performance problem in LLVM JumpThreading #25630

Fix performance problem in LLVM JumpThreading #25630

Keno commented Jan 18, 2018

Keno commented Jan 18, 2018

nanosoldier commented Jan 18, 2018

Keno commented Jan 18, 2018

ararslan commented Jan 18, 2018

ararslan commented Jan 19, 2018

nanosoldier commented Jan 19, 2018

ararslan commented Jan 19, 2018

nanosoldier commented Jan 19, 2018

Keno commented Jan 19, 2018

ararslan commented Jan 20, 2018

ararslan commented Jan 20, 2018

nanosoldier commented Jan 20, 2018

Fix performance problem in LLVM JumpThreading #25630

Fix performance problem in LLVM JumpThreading #25630

Conversation

Keno commented Jan 18, 2018

Keno commented Jan 18, 2018

nanosoldier commented Jan 18, 2018

Keno commented Jan 18, 2018

ararslan commented Jan 18, 2018

ararslan commented Jan 19, 2018

nanosoldier commented Jan 19, 2018

ararslan commented Jan 19, 2018

nanosoldier commented Jan 19, 2018

Keno commented Jan 19, 2018

ararslan commented Jan 20, 2018

ararslan commented Jan 20, 2018

nanosoldier commented Jan 20, 2018