The tree-height-reduction optimization increases the instruction-level parallelism by changing the order of calculations in a loop to keep the
calculation tree as short as possible.
For example, the following code is improved.
for (int i = 0; i < N; ++i){ a[i] = b[i] + c[i] + d[i] + e[i] + f[i] + g[i] + h[i] + k[i] ; }
- Conditions for optimization
- Integer arithmetic: -O1 or higher is effective
- Floating point arithmetic -O1 or higher is effective, AND -ffast-math is effective, AND -mllvm -enable-fp-thr is effective
The not is misleading as this is disabled by default.