-
Notifications
You must be signed in to change notification settings - Fork 256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
measure-complexity non-deterministic? #5126
Comments
Me too. This shouldn't happen, even if you do not specify
My intention is to move However, I can't comment on when exactly this change will be made. |
I also find it surprising. I've heard the hypothesis that Z3 treats the seed 0 specially, with the interpretation that the seed should really be derived from the clock. However, in looking at the source code of Z3 and experimenting with a seed of 0 on various real programs, I haven't been able to confirm that hypothesis. And, even if that were true, any other seed should result in truly deterministic behavior. My first thought for debugging this is to run something like the command @hmijail posted above, twice, with the One other thing that occurred to me is that it's possible, when specifying a time limit, that a given query could succeed or fail non-deterministically based on the load of other processes on the system. But the example above doesn't give a time limit. |
A while ago I tried multiple batches of Additionally, I instrumented Dafny and its "internal" Boogie to print out the sequence of random numbers generated and sent to the solver. In my tests of small batches of 5 iterations the generated numbers seemed to always be the same, even when setting Also, I tried using the equivalent |
@hmijail could you use Or otherwise, could you provide a reproduction, so I can try to do that? |
The solver logs are different even in non-failing verifications, so no idea of what exactly I am looking for. The file that causes the problem is this: https://github.com/ConsenSys/evm-dafny/blob/1a89c03a5932441bba4f90aa1df669446fc7ae6d/src/dafny/util/int.dfy . I'm currently using this call; with about 20 repetitions you should get the error.
In case it helps, in a 200-iterations batch I got an even rarer error:
|
Thanks, having a reproduction is great. We'll schedule diving into this.
That's a completely different bug that we've left unsolved for too long. We'll also schedule solving that. |
While we're at it, what would you think of adding some more direct control of the random number being used, so that one can jump back into a given position of the random stream? My use case: currently I am generating big batches of verification results (e.g. 100 iterations) to look for the extreme values of Resource Usage. However, if a given batch does contain an interesting extreme, I have no way to reproduce the particular random number that created that extreme: even though I currently extract that number, I can't inject it back in, because from a PRNG result I can't get a seed that would generate it. So the only way I have to find interesting cases is by running single iterations with new random seeds. But running 1 iteration, with ancillary startup costs, is about 5-10x slower than running 10 iterations. This could be solved by e.g. making the seed the first "random" number to be used, and generating the next random number by hashing the previous one. (please let me know if this should be a different issue) |
(If this change would be welcome I think I could contribute it myself in a PR) |
I think a user friendly way of reproducing a certain iteration, would be for each iteration to emit which --random-seed argument should be used to reproduce that iteration. In Dafny master there's currently a bug with --iteration, and I think we need to fix that before it's possible to add the above feature. I've created an issue to do both: #5134 |
Maybe connected: I just noticed that running single iterations vs multiple iterations yields very different Resource Count values in the CSV file. I had a 1000-iteration CSV log showing that a given DisplayName reached a range of Resource Counts between 8M and 24M. So I ran 5k single iterations (with randomized seed) to find a 8M case. The Resource Count never went under 19M. The file was this one, function Precompiled.CallModExp. |
I will update the iteration and random seed mechanism in this PR: #5138 Let's wait for that one to land before we gather more data. |
This should be implemented now through #5138 Here's what it looks like:
The random seed is also shown in the CSV file for each line.
I tried that but I don't get the error on Dafny 4.4. However, it could be that I just need to try for longer. Would you mind if I do not debug this issue further, since the upcoming Dafny reworks how
Note that If I look at the logs of solver inputs, then the
I see the same. @atomb would you mind helping debug that? It would be quite bad if the resource count values can not be relied on. |
Yes, of course. Surely you know better what's the general status and plan. Regarding the different RUs in single-iterations vs multiple-iterations: actually beware, I might have been mis-measuring that. (I have been writing scripts to deal with the CSVs and I suspect the CSVs I was processing by then had a different row cardinality than I expected at the moment) On the other hand, I just found yet another related thing:
|
Is that with the released Dafny 4.4 ? Not with the nightly |
(my last message was sent by email and got corrupted, so I deleted it) This is with released 4.4. |
I've been experimenting with this using today's latest commit, and I'm finding that I looked at the generated SMT files, and the difference comes down to naming. With |
I agree, it would be good to do that to prevent confusion. |
This change ensures that, when specifying `/randomizeVcIterations:n`, the first iteration is equivalent to that achieved by `/normalizeNames:1`. One consequence of this is that normalized names are always chosen pseudo-randomly (though deterministically, since it always starts with a fixed seed, which is 0 if not otherwise specified). Fixes dafny-lang/dafny#5126
Dafny version
4.4
Code to produce this issue
No response
Command to run and resulting output
No response
What happened?
I ran the exact same command twice in a row, with no changes to source code:
dafny measure-complexity --random-seed 1 --iterations 100 --log-format csv int.dfy
.One of the runs finished correctly, the other reported a verification error:
int.dfy(738,40): Error: result of operation might violate newtype constraint for 'i256'
I guess the file doesn't matter; I'm just surprised because I expected that, given a random seed, the iterations' result would be the same. And this seemed to hold when I did small test runs with few iterations: the
TestResult.ResourceCount
values in the CSV were the same in each run.So, are different runs with the same random seed expected to yield the same results?
By the way, I found this while trying to make sense of the verification coverage reports vs the resource usage variability. Given that measure-complexity is experimental, that dafny-reportgenerator seems disabled and untouched since Dafny 3.7, and projects like Dafny64, can you share whether there are any concrete plans for brittleness debugging?
What type of operating system are you experiencing the problem on?
Mac
The text was updated successfully, but these errors were encountered: