Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

measure-complexity non-deterministic? #5126

Open
hmijail opened this issue Feb 24, 2024 · 17 comments
Open

measure-complexity non-deterministic? #5126

hmijail opened this issue Feb 24, 2024 · 17 comments
Assignees
Labels
during 1: program development Bad error message or documentation; IDE bug; crash compiling invalid program kind: bug Crashes, unsoundness, incorrect output, etc. If possible, add a `part:` label part: tools Tools built on top of Dafny priority: next Will consider working on this after in progress work is done

Comments

@hmijail
Copy link

hmijail commented Feb 24, 2024

Dafny version

4.4

Code to produce this issue

No response

Command to run and resulting output

No response

What happened?

I ran the exact same command twice in a row, with no changes to source code:
dafny measure-complexity --random-seed 1 --iterations 100 --log-format csv int.dfy.

One of the runs finished correctly, the other reported a verification error:
int.dfy(738,40): Error: result of operation might violate newtype constraint for 'i256'

I guess the file doesn't matter; I'm just surprised because I expected that, given a random seed, the iterations' result would be the same. And this seemed to hold when I did small test runs with few iterations: the TestResult.ResourceCount values in the CSV were the same in each run.

So, are different runs with the same random seed expected to yield the same results?

By the way, I found this while trying to make sense of the verification coverage reports vs the resource usage variability. Given that measure-complexity is experimental, that dafny-reportgenerator seems disabled and untouched since Dafny 3.7, and projects like Dafny64, can you share whether there are any concrete plans for brittleness debugging?

What type of operating system are you experiencing the problem on?

Mac

@hmijail hmijail added the kind: bug Crashes, unsoundness, incorrect output, etc. If possible, add a `part:` label label Feb 24, 2024
@keyboardDrummer
Copy link
Member

I'm just surprised because I expected that, given a random seed, the iterations' result would be the same.

Me too. This shouldn't happen, even if you do not specify --random-seed, since it should default to 0 then. @atomb thoughts?

Given that measure-complexity is experimental, that dafny-reportgenerator seems disabled and untouched since Dafny 3.7, and projects like Dafny64, can you share whether there are any concrete plans for brittleness debugging?

My intention is to move dafny-reportgenerator into measure-complexity. I would like it to, when passed a dafny project and no further arguments, to point at which Dafny verifiable symbols are brittle, according to the default values of the brittleness threshold parameters.

However, I can't comment on when exactly this change will be made.

@keyboardDrummer keyboardDrummer added part: verifier Translation from Dafny to Boogie (translator) during 3: execution of incorrect program An bug in the verifier that allows Dafny to run a program that does not correctly implement its spec priority: next Will consider working on this after in progress work is done labels Feb 26, 2024
@atomb
Copy link
Member

atomb commented Feb 26, 2024

I'm just surprised because I expected that, given a random seed, the iterations' result would be the same.

Me too. This shouldn't happen, even if you do not specify --random-seed, since it should default to 0 then. @atomb thoughts?

I also find it surprising. I've heard the hypothesis that Z3 treats the seed 0 specially, with the interpretation that the seed should really be derived from the clock. However, in looking at the source code of Z3 and experimenting with a seed of 0 on various real programs, I haven't been able to confirm that hypothesis. And, even if that were true, any other seed should result in truly deterministic behavior.

My first thought for debugging this is to run something like the command @hmijail posted above, twice, with the --solver-log flag, and then compare the (:random-seed ...) entries in the file. They should all be the same. If they're not, the non-determinism is coming from Dafny. If they are (and the files are otherwise the same, too), then it's coming from Z3.

One other thing that occurred to me is that it's possible, when specifying a time limit, that a given query could succeed or fail non-deterministically based on the load of other processes on the system. But the example above doesn't give a time limit.

@hmijail
Copy link
Author

hmijail commented Feb 27, 2024

A while ago I tried multiple batches of --random-seed 0 --iteration 5 because, as @atomb mentioned and happens elsewhere, I thought that seed 0 might mean a randomly-inited-seed. However, the resource usage results were always the same across batches, so it clearly is an actual 0 seed. (maybe this could be mentioned in the docs)

Additionally, I instrumented Dafny and its "internal" Boogie to print out the sequence of random numbers generated and sent to the solver. In my tests of small batches of 5 iterations the generated numbers seemed to always be the same, even when setting --cores to 1 or 4.

Also, I tried using the equivalent --boogie options instead of dafny's random seed and iterations. Again I could get the rare verification failure in that way.

@keyboardDrummer
Copy link
Member

@hmijail could you use --solver-log to see if the logs are different for two differently behaving executions?

Or otherwise, could you provide a reproduction, so I can try to do that?

@hmijail
Copy link
Author

hmijail commented Feb 28, 2024

The solver logs are different even in non-failing verifications, so no idea of what exactly I am looking for.

The file that causes the problem is this: https://github.com/ConsenSys/evm-dafny/blob/1a89c03a5932441bba4f90aa1df669446fc7ae6d/src/dafny/util/int.dfy . I'm currently using this call; with about 20 repetitions you should get the error.

dafny measure-complexity --log-format csv --random-seed 1 --iterations 50 --cores 4 src/dafny/util/int.dfy

In case it helps, in a 200-iterations batch I got an even rarer error:

Unhandled exception: System.NullReferenceException: Object reference not set to an instance of an object.
   at Microsoft.Boogie.ReadOnlyVisitor.VisitVariable(Variable node)
   at Microsoft.Boogie.ReadOnlyVisitor.VisitVariableSeq(List`1 variableSeq)
   at Microsoft.Boogie.ReadOnlyVisitor.VisitImplementation(Implementation node)
   at Microsoft.Boogie.ReadOnlyVisitor.VisitDeclarationList(List`1 declarationList)
   at Microsoft.Boogie.ReadOnlyVisitor.VisitProgram(Program node)
   at Microsoft.Boogie.BasicTypeVisitor..ctor(Absy absy)
   at Microsoft.Boogie.Checker.Target(Program prog, ProverContext ctx, Split split)
   at VC.CheckerPool.PrepareChecker(Program program, Split split, Checker checker)
   at VC.CheckerPool.FindCheckerFor(ConditionGeneration vcgen, Split split, CancellationToken cancellationToken)
   at VC.SplitAndVerifyWorker.DoWork(Int32 iteration, Split split, CancellationToken cancellationToken)
   at VC.SplitAndVerifyWorker.DoWorkForMultipleIterations(Split split, CancellationToken cancellationToken)
   at VC.SplitAndVerifyWorker.WorkUntilDone(CancellationToken cancellationToken)
   at VC.VCGen.VerifyImplementation(ImplementationRun run, VerifierCallback callback, CancellationToken cancellationToken, IObserver`1 batchCompletedObserver)
   at VC.ConditionGeneration.VerifyImplementation(ImplementationRun run, IObserver`1 batchCompletedObserver, CancellationToken cancellationToken)
   at Microsoft.Boogie.ExecutionEngine.<>c__DisplayClass50_0.<<VerifyImplementationWithoutCaching>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at Microsoft.Boogie.ExecutionEngine.<>c__DisplayClass41_0.<<VerifyEachImplementation>b__1>d.MoveNext()
--- End of stack trace from previous location ---
   at Microsoft.Boogie.ExecutionEngine.<>c__DisplayClass41_0.<<VerifyEachImplementation>b__1>d.MoveNext()
--- End of stack trace from previous location ---
   at Microsoft.Boogie.ExecutionEngine.VerifyEachImplementation(TextWriter output, ProcessedProgram processedProgram, PipelineStatistics stats, String programId, ErrorReporterDelegate er, String requestId, Implementation[] stablePrioritizedImpls)
   at Microsoft.Boogie.ExecutionEngine.InferAndVerify(TextWriter output, Program program, PipelineStatistics stats, String programId, ErrorReporterDelegate er, String requestId)
   at Microsoft.Dafny.DafnyMain.BoogiePipelineWithRerun(DafnyOptions options, TextWriter output, ExecutionEngine engine, Program program, String bplFileName, String programId)
   at Microsoft.Dafny.DafnyMain.BoogieOnce(DafnyOptions options, TextWriter output, ExecutionEngine engine, String baseFile, String moduleName, Program boogieProgram, String programId)
   at Microsoft.Dafny.CompilerDriver.BoogieOnceWithTimerAsync(TextWriter output, DafnyOptions options, String baseName, String programId, String moduleName, Program program)
   at Microsoft.Dafny.CompilerDriver.<>c__DisplayClass8_0.<<BoogieAsync>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at Microsoft.Dafny.CompilerDriver.<>c__DisplayClass8_0.<<BoogieAsync>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at Microsoft.Dafny.CompilerDriver.BoogieAsync(DafnyOptions options, String baseName, IEnumerable`1 boogiePrograms, String programId)
   at Microsoft.Dafny.CompilerDriver.ProcessFilesAsync(IReadOnlyList`1 dafnyFiles, ReadOnlyCollection`1 otherFileNames, DafnyOptions options, ProofDependencyManager depManager, Boolean lookForSnapshots, String programId)
   at Microsoft.Dafny.CompilerDriver.RunCompiler(DafnyOptions options)
   at Microsoft.Dafny.DafnyCli.<>c__DisplayClass9_0.<<SetHandlerUsingDafnyOptionsContinuation>g__Handle|0>d.MoveNext()
--- End of stack trace from previous location ---
   at System.CommandLine.Invocation.AnonymousCommandHandler.InvokeAsync(InvocationContext context)
   at System.CommandLine.Invocation.InvocationPipeline.<>c__DisplayClass4_0.<<BuildInvocationChain>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass17_0.<<UseParseErrorReporting>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass12_0.<<UseHelp>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at Microsoft.Dafny.DafnyCli.<>c__DisplayClass20_0.<<AddDeveloperHelp>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass22_0.<<UseVersionOption>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass19_0.<<UseTypoCorrections>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c.<<UseSuggestDirective>b__18_0>d.MoveNext()
--- End of stack trace from previous location ---
   at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass16_0.<<UseParseDirective>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c.<<RegisterWithDotnetSuggest>b__5_0>d.MoveNext()
--- End of stack trace from previous location ---
   at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass8_0.<<UseExceptionHandler>b__0>d.MoveNext()

@keyboardDrummer
Copy link
Member

Thanks, having a reproduction is great. We'll schedule diving into this.

In case it helps, in a 200-iterations batch I got an even rarer error:

That's a completely different bug that we've left unsolved for too long. We'll also schedule solving that.

@atomb atomb self-assigned this Feb 28, 2024
@hmijail
Copy link
Author

hmijail commented Feb 29, 2024

While we're at it, what would you think of adding some more direct control of the random number being used, so that one can jump back into a given position of the random stream?

My use case: currently I am generating big batches of verification results (e.g. 100 iterations) to look for the extreme values of Resource Usage. However, if a given batch does contain an interesting extreme, I have no way to reproduce the particular random number that created that extreme: even though I currently extract that number, I can't inject it back in, because from a PRNG result I can't get a seed that would generate it.

So the only way I have to find interesting cases is by running single iterations with new random seeds. But running 1 iteration, with ancillary startup costs, is about 5-10x slower than running 10 iterations.

This could be solved by e.g. making the seed the first "random" number to be used, and generating the next random number by hashing the previous one.

(please let me know if this should be a different issue)

@hmijail
Copy link
Author

hmijail commented Feb 29, 2024

(If this change would be welcome I think I could contribute it myself in a PR)

@keyboardDrummer
Copy link
Member

I think a user friendly way of reproducing a certain iteration, would be for each iteration to emit which --random-seed argument should be used to reproduce that iteration.

In Dafny master there's currently a bug with --iteration, and I think we need to fix that before it's possible to add the above feature. I've created an issue to do both: #5134

@atomb atomb removed the during 3: execution of incorrect program An bug in the verifier that allows Dafny to run a program that does not correctly implement its spec label Mar 1, 2024
@hmijail
Copy link
Author

hmijail commented Mar 2, 2024

Maybe connected: I just noticed that running single iterations vs multiple iterations yields very different Resource Count values in the CSV file.

I had a 1000-iteration CSV log showing that a given DisplayName reached a range of Resource Counts between 8M and 24M. So I ran 5k single iterations (with randomized seed) to find a 8M case. The Resource Count never went under 19M.
Then I tried running with --iterations 2, and now Resource Count dipped under 10M in about 10 runs.

The file was this one, function Precompiled.CallModExp.

@keyboardDrummer
Copy link
Member

Maybe connected: I just noticed that running single iterations vs multiple iterations yields very different Resource Count values in the CSV file.

I had a 1000-iteration CSV log showing that a given DisplayName reached a range of Resource Counts between 8M and 24M. So I ran 5k single iterations (with randomized seed) to find a 8M case. The Resource Count never went under 19M. Then I tried running with --iterations 2, and now Resource Count dipped under 10M in about 10 runs.

The file was this one, function Precompiled.CallModExp.

I will update the iteration and random seed mechanism in this PR: #5138

Let's wait for that one to land before we gather more data.

@atomb atomb assigned keyboardDrummer and unassigned atomb Mar 5, 2024
@keyboardDrummer
Copy link
Member

keyboardDrummer commented Mar 7, 2024

While we're at it, what would you think of adding some more direct control of the random number being used, so that one can jump back into a given position of the random stream?

This should be implemented now through #5138

Here's what it looks like:

rwillems@bcd0745419f2 evm-dafny % localDafny measure-complexity --random-seed 1 --iterations 100 --log-format csv src/dafny/util/int.dfy 
Starting verification of iteration 0 with seed 1
Starting verification of iteration 1 with seed 534011718
Starting verification of iteration 2 with seed 237820880
Starting verification of iteration 3 with seed 1002897798
Starting verification of iteration 4 with seed 1657007234

The random seed is also shown in the CSV file for each line.

I'm currently using this call; with about 20 repetitions you should get the error.

I tried that but I don't get the error on Dafny 4.4. However, it could be that I just need to try for longer. Would you mind if I do not debug this issue further, since the upcoming Dafny reworks how --iterations is processed in dafny measure-complexity.

Additionally, I instrumented Dafny and its "internal" Boogie to print out the sequence of random numbers generated and sent to the solver. In my tests of small batches of 5 iterations the generated numbers seemed to always be the same, even when setting --cores to 1 or 4.

Note that --solver-log doesn't work well with measure-complexity in the sense that it only outputs log files for one iterations (I'm guessing the last).

If I look at the logs of solver inputs, then the --random-seed value significantly changes what is sent to the solver, including the values of :smt.random_seed and :sat.random_seed

I just noticed that running single iterations vs multiple iterations yields very different Resource Count values in the CSV file.

I see the same. @atomb would you mind helping debug that? It would be quite bad if the resource count values can not be relied on.

@hmijail
Copy link
Author

hmijail commented Mar 11, 2024

Would you mind if I do not debug this issue further

Yes, of course. Surely you know better what's the general status and plan.

Regarding the different RUs in single-iterations vs multiple-iterations: actually beware, I might have been mis-measuring that. (I have been writing scripts to deal with the CSVs and I suspect the CSVs I was processing by then had a different row cardinality than I expected at the moment)

On the other hand, I just found yet another related thing: --random-seed X --iterations 1 causes a weird error that disappears with --random-seed X --iterations 2.

❯ dafny44 measure-complexity  --random-seed 2597730344 --iterations 1 /Users/mija/programming/ConsenSys/evm-dafny/src/dafny/core/precompiled.dfy
Prover error: line 120 column 29: Expected values for parameter random_seed is an unsigned integer. It was given argument '-1697236952'

Prover error: line 313 column 29: Expected values for parameter random_seed is an unsigned integer. It was given argument '-1697236952'

Prover error: line 314 column 29: Expected values for parameter random_seed is an unsigned integer. It was given argument '-1697236952'

Prover error: line 121 column 29: Expected values for parameter random_seed is an unsigned integer. It was given argument '-1697236952'

Prover error: line 291 column 29: Expected values for parameter random_seed is an unsigned integer. It was given argument '-1697236952'

Prover error: line 292 column 29: Expected values for parameter random_seed is an unsigned integer. It was given argument '-1697236952'

Prover error: line 958 column 29: Expected values for parameter random_seed is an unsigned integer. It was given argument '-1697236952'

Prover error: line 959 column 29: Expected values for parameter random_seed is an unsigned integer. It was given argument '-1697236952'

^C
❯ dafny44 measure-complexity  --random-seed 2597730344 --iterations 2 /Users/mija/programming/ConsenSys/evm-dafny/src/dafny/core/precompiled.dfy

Dafny program verifier finished with 17 verified, 0 errors

@keyboardDrummer
Copy link
Member

causes a weird error that disappears with

Is that with the released Dafny 4.4 ? Not with the nightly

@hmijail
Copy link
Author

hmijail commented Mar 11, 2024

(my last message was sent by email and got corrupted, so I deleted it)

This is with released 4.4.

@keyboardDrummer keyboardDrummer removed their assignment Mar 19, 2024
@atomb
Copy link
Member

atomb commented Mar 27, 2024

Maybe connected: I just noticed that running single iterations vs multiple iterations yields very different Resource Count values in the CSV file.

I had a 1000-iteration CSV log showing that a given DisplayName reached a range of Resource Counts between 8M and 24M. So I ran 5k single iterations (with randomized seed) to find a 8M case. The Resource Count never went under 19M. Then I tried running with --iterations 2, and now Resource Count dipped under 10M in about 10 runs.

The file was this one, function Precompiled.CallModExp.

I've been experimenting with this using today's latest commit, and I'm finding that verify and measure-complexity --iterations 1 always give the same resource counts, but that measure-complexity --iterations 2 (or any higher number) give a slightly different resource count.

I looked at the generated SMT files, and the difference comes down to naming. With --iterations 1 the SMT query has names of the form $generated@@<n> but with --iterations 2 the query has names of the form random<n1>@@<n2>. I expect that making those names consistent across all usage modes should solve the problem.

@keyboardDrummer
Copy link
Member

I expect that making those names consistent across all usage modes should solve the problem.

I agree, it would be good to do that to prevent confusion.

atomb added a commit to atomb/boogie that referenced this issue Apr 17, 2024
This change ensures that, when specifying `/randomizeVcIterations:n`,
the first iteration is equivalent to that achieved by
`/normalizeNames:1`. One consequence of this is that normalized names
are always chosen pseudo-randomly (though deterministically, since it
always starts with a fixed seed, which is 0 if not otherwise specified).

Fixes dafny-lang/dafny#5126
@keyboardDrummer keyboardDrummer added during 1: program development Bad error message or documentation; IDE bug; crash compiling invalid program part: tools Tools built on top of Dafny and removed part: verifier Translation from Dafny to Boogie (translator) labels Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
during 1: program development Bad error message or documentation; IDE bug; crash compiling invalid program kind: bug Crashes, unsoundness, incorrect output, etc. If possible, add a `part:` label part: tools Tools built on top of Dafny priority: next Will consider working on this after in progress work is done
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants