-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmarks are at risk of being optimised away #22
Comments
I have no interest in this issue as I find no value in micro-benchmarks across implementations. There's been no useful correlation between micro-benchmarks and application behavior. Further, I've found little utility in "application-style" benchmarks. I'm focusing on tooling to help understand actual application performance, which includes memory load, CPU time, IO time, concurrency, etc. |
It's not about comparing performance across implementations - I never mentioned that. People use benchmark-ips to compare performance of different Ruby methods and algorithms within the same implementation. And as Rubinius and JRuby get more sophisticated you'll find the same problem we have - benchmarks written using benchmark-ips are at risk of optimising to nothing. However now I've done a little more work I'm not sure What we really need benchmark-ips to do is to supply non-predictable inputs for each iteration, and to consume the output value in a way that has a hard side effect, such as writing to a file. |
First, people do compare across implementations, so I'm being clear that I'm not interested in that case. More generally, I'm not interested in this issue for the reasons stated. |
Seems fair to point out that this library is intended for microbenchmarks...so if this library still has a reason to exist, then enhancements to make it more accurate or reliable should be welcome. Personal opinions about microbenchmarking don't change the fact that this is a microbenchmarking library. That said...we have tried to deal with microbenchmarks optimizing away in JRuby before, and the best answer has always been to write a better benchmark. The tricky bit is knowing when it is time to stop using a particular benchmark, since there are many cases where we explicitly want to measure the optimization to ensure it's still working right. |
Yes, and it turns out that my solution doesn't do anything useful anyway. I got benchmark-ips working in Truffle, added |
I appreciate you raising the issue Chris. It seems like the response you got was based on a misunderstanding of what you said. :( |
This is an old issue. Has something happened that has renewed interest in it? @gerrywastaken if your comment is referring to my comments above, I can reiterate that I was asked for my opinion directly by Chris and I gave it. In the intervening time, it hasn't changed. I still have not found any utility in microbenchmarks. Hopefully that's helpful. I never said no else should not find value in them or should not work on them. It's just that I won't spend any time on them. |
As implementations of Ruby get more powerful, benchmarks written using benchmark-ips, and other micro-benchmarks in general are at risk of being silently optimised away. Benchmarks are already confusing for non-specialists, and at the moment the only way of working that out is to look at generated machine code.
Take this example benchmark from the documentation:
The first problem is that the operation being benchmarked here is runtime constant! With inline caches, dynamic inlining and constant folding, we can reduce
1 + 2
to3
, and with dynamic deoptimization we can do it without any guards. I think at least Topaz and Truffle can achieve that today, and JRuby probably will be able to achieve it with the new IR - I'm not sure. I'm also not sure about Rubinius. Maybe a future MRI JIT will also be able to do it.The second problem is that the whole loop itself is also vulnerable to being optimised away. It performs no side effects and produces no value (except nil). You could say it observes side effects, but with dynamic deoptimisation, or with hoisting guard out of the loop, all the side effect observations of the loop can be modelled as happening instantaneously, once for the entire loop. I don't believe any implementation of Ruby can currently remove this loop (it is not easy to do in practice) but we're certainly working towards it very quickly in Truffle.
What can we do about this?
The root of the problem is that the literal values
1
and2
are constants, and the compiler can see this. What about introducing a special function that the compiler will pretend that it cannot see through. Assuming we could get all implementations on board, we could perhaps call thisKernel#optimisation_barrier
. Then the code would look like this:This solves the first problem. I'm not sure if it also solves the second (currently hypothetical) problem, as the loop body is no longer constant but does that matter for removing the loop? We could remove the computation if we are sure it has no side effects - and there's no possibility of overflow here so I don't think there are any. I can implement this
optimisation_barrier
in Truffle today. For MRI and other implementations it could perhaps be a no-op. If we can't get all implementations on board, benchmark-ips could define it as a no-op if the implementation doesn't provide one. We could also pull that out into a separate gem.Downsides are that the person writing benchmarks has to figure out where to add these barriers, and although it's a no-op in MRI, MRI is not able to inline through it and so it may add significant overhead.
What do other Ruby implementors, @headius and @brixen, think about this? Should we standardise on a barrier like this across all implementations?
The text was updated successfully, but these errors were encountered: