Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add /randomSeedIterations flag for verification stability analysis #567

Merged
merged 20 commits into from
May 2, 2022

Conversation

atomb
Copy link
Collaborator

@atomb atomb commented Apr 19, 2022

This PR adds a new flag, /randomSeedIterations:<n>, which instructs Boogie to send each verification condition to the SMT solver n times, with the goal of analyzing the stability of proof across those n iterations. When used with /vcsCores:<n>, the iterations can be evaluated in parallel.

From the user perspective, the output of /trace and /xml now includes iteration numbers along with assertion batch indices.

Fixes #494.

By submitting this pull request, I confirm that my contribution is made under the terms of the MIT license.

We want to be able to check one split with multiple checkers, and this
fixed link between Checker and Split is easy to remove.
This doesn't yet include the logic to ensure that each attempt uses a
different random seed, which will be necessary for real stability
analysis.
As long as /randomSeed is provided. Otherwise, do no randomization.
This makes it easier to keep track of what’s going on when producing
multiple results from a single split.
Previously, some would occasionally be lost.
@keyboardDrummer
Copy link
Collaborator

It's most useful in conjunction with /randomSeed

Is it any use without /randomSeed ?

in which case it'll use a different random seed for each iteration of Boogie -> SMT translation

Shouldn't it always do that, even if no /randomSeed is specified ?

@@ -153,8 +153,13 @@ private void SetRlimit(uint rlimit)
/// </summary>
private void Setup(Program prog, ProverContext ctx, Split split = null)
{
SolverOptions.RandomSeed = split?.RandomSeed ?? Options.RandomSeed;
if (Options.VcsStabilityIterations < 2 || SolverOptions.RandomSeed == null) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the Options.VcsStabilityIterations < 2 ?

Would SolverOptions.RandomSeed ??= split?.RandomSeed ?? Options.RandomSeed not work?

Consider not checking Options.VcsStabilityIterations:

private void Setup(Program prog, ProverContext ctx, Split split = null)
{
  if (SolverOptions.RandomSeed == null) {
    SolverOptions.RandomSeed = split?.RandomSeed ?? Options.RandomSeed
  } else {
    SolverOptions.RandomSeed = new Random(SolverOptions.RandomSeed.Value).Next();
  }
  var random = SolverOptions.RandomSeed == null ? null : new Random(SolverOptions.RandomSeed.Value);

  Program = prog;
  ..
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first time I wrote it, I did exactly that, but it makes it impossible to use different random seeds for different methods using the {:random_seed N} attribute (which we fortunately have a test to check!).

@@ -153,8 +153,13 @@ private void SetRlimit(uint rlimit)
/// </summary>
private void Setup(Program prog, ProverContext ctx, Split split = null)
Copy link
Collaborator

@keyboardDrummer keyboardDrummer Apr 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether this code leads to nondeterministic behavior when you're running multiple splits in parallel and multiple stability iterations. The splits can share the same checker so they might influence each other's random seeds. That might not be a problem since we're doing stability analysis so determinism is not the goal, but not being able to get reproducible output might be concerning to some.

If you want deterministic behavior, update the RandomSeed of the Split on each iteration, and set SolverOptions.RandomSeed using that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, interesting idea. I'll look into that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On further thought, I think that this approach would have the same issue that the comment above does: methods explicitly annotated with {:random_seed N} would instead have a random seed that would evolve over time (though in a predictable way). As it's currently written, the non-determinism you mention would only occur in the case where /vcsStabilityIterations is specified in which case I think it's at least arguably desirable.

Copy link
Collaborator

@keyboardDrummer keyboardDrummer Apr 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with your assessment, but still I believe that any CLI behavior which is significant, such as whether a Split has verification failures on some iterations, preferably is reproducible by calling the CLI again with the same arguments. I think you could achieve that with minor changes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think you're right. I came up with something that should achieve that while having slightly simpler logic overall.

@@ -524,14 +525,16 @@ void ObjectInvariant()
Contract.Invariant(cce.NonNullElements(vcResults));
}

public readonly List<Counterexample> /*!>!*/ examples = new List<Counterexample>();
public readonly List<VCResult> /*!>!*/ vcResults = new List<VCResult>();
public readonly ConcurrentQueue<Counterexample> /*!>!*/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oof, this line cutoff is so narrow! What's the scenario in which we want 80 chars maximum?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I often view code in an 80-character terminal, though I don't have a strong preference about that. But the main reason for this is that I originally wrote it as a diff on top of the code that existed before you joined that line. ;)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed by simplifying the initialization expressions to that they're on one <80 character line. :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can make the RHS of these statements just new ();, which saves quite a few characters!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that's what I did!

public readonly ConcurrentQueue<Counterexample> /*!>!*/
examples = new ConcurrentQueue<Counterexample>();
public readonly ConcurrentQueue<VCResult> /*!>!*/
vcResults = new ConcurrentQueue<VCResult>();

public override void OnCounterexample(Counterexample ce, string /*?*/ reason)
Copy link
Collaborator

@keyboardDrummer keyboardDrummer Apr 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mixing counterexamples from different iterations might be confusing. Can we avoid that?

I already find the current structure a bit confusing:

record VerificationResult(List<Counterexample> Errors, List<VCResult> VCResults);
record VCResult(List<Counterexample> Counterexamples, List<AssertCmd> Asserts, TimeSpan RunTime);

There's two lists of Counterexample in there!

I'd prefer to be more explicit:

record VerificationResult(List<SplitResult> SplitResults);
record SplitResult(List<AssertCmd> Asserts, List<IterationResult> IterationResult);
record IterationResult(List<Counterexample> Counterexamples, TimeSpan RunTime);

With the option of later renaming Split and SplitResult to AssertionBatch and AssertionBatchResult

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this approach. I'll give it a try.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, the implementation is a bit more intricately tied to that design than I expected. In particular, there's a notion of "recycled failing assertions" that I don't fully understand but that depends on counterexamples that aren't tied to our current notion of splits. Houdini also has some complex interactions with this list of counterexamples.

So, although I'd love to get rid of that duplicated list of counterexamples, I think it'll take a good bit more investigation to figure out how to do that without breaking things.

When using /vcsStabilityIterations, this does mean that errors (including counterexamples) will be reported more than once. However, after thinking about it a bit more, I think that's the right thing to do. Different iterations might produce different counterexamples, and those differences could be informative.

It is a bit unfortunate that all of these counterexamples are aggregated into a single list. However, it was already the case that counterexamples from different splits were aggregated into that list. So, although this PR doesn't fix that issue, I'd argue that it doesn't really make it worse, either.

Copy link
Collaborator

@keyboardDrummer keyboardDrummer Apr 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When using /vcsStabilityIterations, this does mean that errors (including counterexamples) will be reported more than once. However, after thinking about it a bit more, I think that's the right thing to do.

When we previously looked at Excel reports to analyse Dafny verification stability, one of the key indicators we looked at was % of iterations that had errors. I think it would be good to be able to report the # of errors per iteration. Wouldn't concatenating errors from all iterations in one list prevent that?

Also, I don't see a use-case for showing errors including the detailed error message when using /vcsStabilityIterations. However, if you do it, then wouldn't you want to remove duplicates?

How about we leave the list of counterexamples in VerificationResult that is operated on by Houdini and RecycledFailedAssertions, but update the list that's only tied to Splits?

record VerificationResult(List<Counterexample> Counterexamples, List<SplitResult> SplitResults); // here we leave in the existing list.
record SplitResult(List<AssertCmd> Asserts, List<IterationResult> IterationResult);
record IterationResult(List<Counterexample> Counterexamples, TimeSpan RunTime);

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each VCResult has an outcome, so it'll still be easy to determine what % have errors. Any client that doesn't want the aggregate list of counterexamples can ignore it, and just use the shorter, nested lists. That aggregate list exists only for backward compatibility, and I hope we can eventually remove it, I just don't yet fully understand how to do so without breaking some existing functionality.

When it comes to what Boogie prints, my mental model of the feature, and the one I can imagine most easily communicating, is that it's like running Boogie multiple times, but interleaved. And with that model, seeing the same error multiple times makes sense.

I'm not 100% sure I understand your last comment. If what you're suggesting is that we package the aggregate list of counterexamples in the top-level VerificationResult, so it's not hanging out on its own, I think that makes sense.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More specifically, I'm not quite sure what you mean by "update the list that's only tied to Splits".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, VerificationResult already includes the list of counterexamples, so there's nothing to do there, unless you had something different in mind with that last comment.

I haven't created a distinction between VCResult/SplitResult and IterationResult. Although doing so would make the data type slighlty more intuitive, it would make most of the clients of that data type more complex without a clear benefit. Grouping a list of VCResult by the vcNum field can achieve the same outcome, in cases where it's useful. I did just make a change to avoid duplicating identical lists of AssertCmd between VCResults, however.

Source/VCGeneration/SplitAndVerifyWorker.cs Outdated Show resolved Hide resolved
@atomb
Copy link
Collaborator Author

atomb commented Apr 20, 2022

It's most useful in conjunction with /randomSeed

Is it any use without /randomSeed ?

For debugging, it can be somewhat useful to remove the randomness factor. It is somewhat unfortunate that the current command-line option structure makes that the default, though.

in which case it'll use a different random seed for each iteration of Boogie -> SMT translation

Shouldn't it always do that, even if no /randomSeed is specified ?

Arguably, yes. Boogie would have to invent a random seed in that case, but either a fixed or random value could be reasonable.

This is in preparation for ultimately having all counterexamples live
in the `VCResult` record, rather than in a separate list in a
`VerificationResultCollector` instance. It’ll take some more serious
refactoring to make that work, though.
@atomb
Copy link
Collaborator Author

atomb commented Apr 20, 2022

Shouldn't it always do that, even if no /randomSeed is specified ?

Arguably, yes. Boogie would have to invent a random seed in that case, but either a fixed or random value could be reasonable.

I've now adapted it to assume /randomSeed:0 if there was no /randomSeed option. That way we always get some variation but deterministically so. The debugging use case probably isn't worth focusing on.

@robin-aws
Copy link
Contributor

Do we expect to automatically vary any other parameters in the future across multiple runs? If we started adding semantics-preserving transformations, for example, how would a user configure how to apply exactly N different versions in an easy-to-understand way?

I'm wondering if it would be clearer and more accurate to just call this option /randomSeedIterations:<n>.

@atomb
Copy link
Collaborator Author

atomb commented Apr 20, 2022

Do we expect to automatically vary any other parameters in the future across multiple runs? If we started adding semantics-preserving transformations, for example, how would a user configure how to apply exactly N different versions in an easy-to-understand way?

I'm wondering if it would be clearer and more accurate to just call this option /randomSeedIterations:<n>.

That'd be a reasonable name, IMO. Just to be clear, though, my understanding is that other transformations like that would likely exist for stability measurement, as well. Is that what you're thinking, too?

@robin-aws
Copy link
Contributor

Do we expect to automatically vary any other parameters in the future across multiple runs? If we started adding semantics-preserving transformations, for example, how would a user configure how to apply exactly N different versions in an easy-to-understand way?
I'm wondering if it would be clearer and more accurate to just call this option /randomSeedIterations:<n>.

That'd be a reasonable name, IMO. Just to be clear, though, my understanding is that other transformations like that would likely exist for stability measurement, as well. Is that what you're thinking, too?

Definitely, I think there's tons of different ways of perturbing the verification across different runs. If the option is named something more general like /vcsStabilityIterations:<n> it implies that we'll need a lot more knobs in the future for exactly how we get those n variants. I'm a little worried that we'll want more precise other options like (don't take these too literally) /vcsArithRewriteDepth:<m> and /vcsRenamings:<k>, at which point that top-level n might be redundant because you'll end up with m * k variants based on the other options.

I could see this getting complicated enough that we start to want a configuration file of some kind. If we think that it's worth the efficiency gain to have a single Boogie process do multiple verification runs, the downside is we almost need a DSL to describe what an external script would have done with multiple calls to Boogie instead.

@atomb
Copy link
Collaborator Author

atomb commented Apr 20, 2022

Do we expect to automatically vary any other parameters in the future across multiple runs? If we started adding semantics-preserving transformations, for example, how would a user configure how to apply exactly N different versions in an easy-to-understand way?
I'm wondering if it would be clearer and more accurate to just call this option /randomSeedIterations:<n>.

That'd be a reasonable name, IMO. Just to be clear, though, my understanding is that other transformations like that would likely exist for stability measurement, as well. Is that what you're thinking, too?

Definitely, I think there's tons of different ways of perturbing the verification across different runs. If the option is named something more general like /vcsStabilityIterations:<n> it implies that we'll need a lot more knobs in the future for exactly how we get those n variants. I'm a little worried that we'll want more precise other options like (don't take these too literally) /vcsArithRewriteDepth:<m> and /vcsRenamings:<k>, at which point that top-level n might be redundant because you'll end up with m * k variants based on the other options.

Ah, right! At first I thought you were concerned that the original name was too narrow, but you were concerned that it was too broad. I agree, and I've changed it to /randomSeedIterations.

@atomb atomb changed the title Add /vcsStabilityIterations flag for verification stability analysis Add /randomSeedIterations flag for verification stability analysis Apr 21, 2022
Attempt to prove each VC n times with n random seeds. If
/randomSeed has been provided, each proof attempt will use
a new random seed derived from this original seed. If not,
it will implicitly assume /randomSeed:0 to ensure difference
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider saying use instead of assume

@robin-aws robin-aws merged commit ca37298 into boogie-org:master May 2, 2022
@atomb atomb deleted the stability-analysis branch January 4, 2024 17:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support verifying a single goal with multiple checkers to measure stability
3 participants