Benchmarks are at risk of being optimised away #22

chrisseaton · 2014-12-29T12:17:10Z

As implementations of Ruby get more powerful, benchmarks written using benchmark-ips, and other micro-benchmarks in general are at risk of being silently optimised away. Benchmarks are already confusing for non-specialists, and at the moment the only way of working that out is to look at generated machine code.

Take this example benchmark from the documentation:

  x.report("addition2") do |times|
    i = 0
    while i < times
      1 + 2
      i += 1
    end
  end

The first problem is that the operation being benchmarked here is runtime constant! With inline caches, dynamic inlining and constant folding, we can reduce 1 + 2 to 3, and with dynamic deoptimization we can do it without any guards. I think at least Topaz and Truffle can achieve that today, and JRuby probably will be able to achieve it with the new IR - I'm not sure. I'm also not sure about Rubinius. Maybe a future MRI JIT will also be able to do it.

The second problem is that the whole loop itself is also vulnerable to being optimised away. It performs no side effects and produces no value (except nil). You could say it observes side effects, but with dynamic deoptimisation, or with hoisting guard out of the loop, all the side effect observations of the loop can be modelled as happening instantaneously, once for the entire loop. I don't believe any implementation of Ruby can currently remove this loop (it is not easy to do in practice) but we're certainly working towards it very quickly in Truffle.

What can we do about this?

The root of the problem is that the literal values 1 and 2 are constants, and the compiler can see this. What about introducing a special function that the compiler will pretend that it cannot see through. Assuming we could get all implementations on board, we could perhaps call this Kernel#optimisation_barrier. Then the code would look like this:

  x.report("addition2") do |times|
    i = 0
    while i < times
      optimisation_barrier(1) + optimisation_barrier(2)
      i += 1
    end
  end

This solves the first problem. I'm not sure if it also solves the second (currently hypothetical) problem, as the loop body is no longer constant but does that matter for removing the loop? We could remove the computation if we are sure it has no side effects - and there's no possibility of overflow here so I don't think there are any. I can implement this optimisation_barrier in Truffle today. For MRI and other implementations it could perhaps be a no-op. If we can't get all implementations on board, benchmark-ips could define it as a no-op if the implementation doesn't provide one. We could also pull that out into a separate gem.

Downsides are that the person writing benchmarks has to figure out where to add these barriers, and although it's a no-op in MRI, MRI is not able to inline through it and so it may add significant overhead.

What do other Ruby implementors, @headius and @brixen, think about this? Should we standardise on a barrier like this across all implementations?

The text was updated successfully, but these errors were encountered:

brixen · 2014-12-30T06:00:17Z

I have no interest in this issue as I find no value in micro-benchmarks across implementations. There's been no useful correlation between micro-benchmarks and application behavior. Further, I've found little utility in "application-style" benchmarks.

I'm focusing on tooling to help understand actual application performance, which includes memory load, CPU time, IO time, concurrency, etc.

chrisseaton · 2014-12-30T10:28:26Z

It's not about comparing performance across implementations - I never mentioned that.

People use benchmark-ips to compare performance of different Ruby methods and algorithms within the same implementation. And as Rubinius and JRuby get more sophisticated you'll find the same problem we have - benchmarks written using benchmark-ips are at risk of optimising to nothing.

However now I've done a little more work I'm not sure optimisation_barrier will even be enough - with value profiling Truffle is seeing that the receiver and argument to Fixnum#+ only ever have one value each and so is assuming they are constant anyway, and the benchmark is still optimising away.

What we really need benchmark-ips to do is to supply non-predictable inputs for each iteration, and to consume the output value in a way that has a hard side effect, such as writing to a file.

brixen · 2014-12-30T17:10:01Z

First, people do compare across implementations, so I'm being clear that I'm not interested in that case. More generally, I'm not interested in this issue for the reasons stated.

headius · 2014-12-31T16:40:02Z

Seems fair to point out that this library is intended for microbenchmarks...so if this library still has a reason to exist, then enhancements to make it more accurate or reliable should be welcome. Personal opinions about microbenchmarking don't change the fact that this is a microbenchmarking library.

That said...we have tried to deal with microbenchmarks optimizing away in JRuby before, and the best answer has always been to write a better benchmark. The tricky bit is knowing when it is time to stop using a particular benchmark, since there are many cases where we explicitly want to measure the optimization to ensure it's still working right.

chrisseaton · 2014-12-31T17:18:12Z

Yes, and it turns out that my solution doesn't do anything useful anyway. I got benchmark-ips working in Truffle, added optimisation_barrier and it still doesn't stop the benchmarks being optimised away as the compiler still manages to figure out the operands are constant.

gerrywastaken · 2020-09-24T13:22:29Z

I appreciate you raising the issue Chris. It seems like the response you got was based on a misunderstanding of what you said. :(

brixen · 2020-09-24T19:51:53Z

This is an old issue. Has something happened that has renewed interest in it?

@gerrywastaken if your comment is referring to my comments above, I can reiterate that I was asked for my opinion directly by Chris and I gave it. In the intervening time, it hasn't changed. I still have not found any utility in microbenchmarks. Hopefully that's helpful. I never said no else should not find value in them or should not work on them. It's just that I won't spend any time on them.

chrisseaton closed this as completed Dec 31, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarks are at risk of being optimised away #22

Benchmarks are at risk of being optimised away #22

chrisseaton commented Dec 29, 2014

brixen commented Dec 30, 2014

chrisseaton commented Dec 30, 2014

brixen commented Dec 30, 2014

headius commented Dec 31, 2014

chrisseaton commented Dec 31, 2014

gerrywastaken commented Sep 24, 2020

brixen commented Sep 24, 2020

Benchmarks are at risk of being optimised away #22

Benchmarks are at risk of being optimised away #22

Comments

chrisseaton commented Dec 29, 2014

brixen commented Dec 30, 2014

chrisseaton commented Dec 30, 2014

brixen commented Dec 30, 2014

headius commented Dec 31, 2014

chrisseaton commented Dec 31, 2014

gerrywastaken commented Sep 24, 2020

brixen commented Sep 24, 2020