[draft] Automated differential testing for the LSP server #1684

cpitclaudel · 2022-01-04T22:22:19Z

This PR implements a test driver that runs multiple Dafny frontends on an input and compares their outputs.

This program takes a list of Dafny frontends (either the CLI or the LSP server) and a list of inputs (snapshots or LSP traces) and runs them through each frontend. Verification results are then compared to ensure that they match.

A frontend is an executable (either the Dafny CLI Dafny.exe or the Dafny LSP Server DafnyLanguageServer.exe) plus the arguments to pass to it.

An input is either a Dafny file (.dfy) or an LSP trace (.lsp). If given a file xyz.dfy, this program first looks for xyz.v0.dfy, xyz.v1.dfy, etc. before falling back to xyz.dfy.

Limitations:

This only checks error messages, not return codes
This only checks for errors in the main file, not in all files reported by the LSP server.

Example usage:

 ./difftester.py --frontend Dafny /printTooltips \
                 --frontend DafnyLanguageServer /--verifier:verifysnapshots=0 \
                 --frontend DafnyLanguageServer /--verifier:verifysnapshots=3 \
                 --input trace.lsp

This steps through all edits in the LSP trace trace.lsp and runs each of them three times, once in a fresh Dafny CLI, once by sending them to a server that doesn't use caching, and once to a server that uses caching. If results match, nothing is printed.

Command line flags:

usage: difftester.py [-h] [--verbose] [-j N] [--timeout N] --frontend
                     FRONTEND_NAME [ARGUMENTS ...] --input INPUT

optional arguments:
  -h, --help            show this help message and exit
  --verbose, -v         Increase verbosity (repeat for more).
  -j N                  Run command line tests in N concurrent threads.
  --timeout N           Limit execution time to N seconds for individual CLI
                        runs.
  --frontend FRONTEND_NAME [ARGUMENTS ...]
                        Register a frontend
  --input INPUT         Register an input file

This is not integrated into the CI yet. Additionally, since many tests require custom Dafny flags that the server does not support, this won't be able to run on all of Dafny's test suite.

I'm not sure how to go about the review, since it's a lot of code at once.

MikaelMayer · 2022-01-05T16:55:44Z

That's a great script to have !
I think it would be great to have it as a tool easy to use down the road.
Since it's in the Test/ directory, I don't see how it's looking up for the Binaries directory. Do you have to execute it from the Binaries/ directory?

MikaelMayer · 2022-01-05T16:58:08Z

Test/diffs/difftester.py

+
+Example usage::
+
+ ./difftester.py --frontend Dafny \


Can you ensure you list this example when you run the command-line without providing the correct arguments?

It does that already, I think

Test/diffs/difftester.py

cpitclaudel · 2022-01-05T23:05:02Z

Since it's in the Test/ directory, I don't see how it's looking up for the Binaries directory. Do you have to execute it from the Binaries/ directory?

It uses whichever binaries you give it. The example ~~assumes~~ assumed that you had Dafny on your path, but typically I just pass it a relative path.

keyboardDrummer · 2022-01-06T10:23:32Z

Test/diffs/difftester.py

@@ -0,0 +1,955 @@
+#!/usr/bin/env python3
+r"""A differential tester for Dafny.


Why is Python the right language for this? I don't expect Dafny engineers to be experts at any language other than the main language Dafny is written in, so I think not having it in C# severely decreases the maintainability. That isn't a deal breaker but there needs to be a good reason.

There is no good reason, except that it made it easier to prototype. I'm happy to port it to C# if we find that better.

I think having it in C# is much more maintainable. I worked on a product defined in many languages before and there were significant differences in code quality between the most used language and others.

I agree that c# is more maintainable. However, it's possible to immediately run a python file, whereas it's not with a c#.
If we just use this script for smoking tests down the road or like a tool in our tool belt, then we are fine keeping it in Python. If we plan to integrate it in the CI or to automate the process, then C# might be a better option.

However, it's possible to immediately run a python file, whereas it's not with a c#.

Can you elaborate? What's the difference in UX between python file.py and dotnet run file.csproj ?

I know it matters for Java (a JVM takes time to start), I'm not sure for .NET

Still, two commands, and moreover you need to install a tool(?)

The dotnet-script tool would be installed by placing it in the dotnet-tools.json file in our repository and then running dotnet tool install which we already do in various places. I think if you want to do the Python path properly, you would also provide a reliable way of ensuring the right Python version is used. Maybe we would place Python installation instructions in a README or add something to our makefile. I'm not sure what's idiomatic there.

I'm a bit confused by this discussion. Doesn't the benefit of having the repository be (more) in a single language far outweigh these differences in how to run the script?

Sorry for the confusion. Thanks for digging deeper.
Having the repository to be (more) in a single language outweighs a little - in my opinion - these differences in how to run the script. If we had the c# script, we could always make an extra shell wrapper around it to run it, so that wouldn't be a problem.

Now, the question of this thread is: do we want @cpitclaudel to actually rewrite this in C#? I argue that we don't need to spend time doing that. This script is not integrated with the Dafny source files, it's only running processes. We would gain nothing porting a program of this size to a typed language. This script is not going to evolve substantially, is very likely to not going to have unit tests.

Each language has its own idiomatic use. Python has been designed for scripting and quick prototyping, file and process handling and so on, C# has been designed for building applications and robust shared libraries. Hence, I understand the current choice, although I would have never pushed anyone to use Python instead of C# if they preferred to start their project with the latter.

@MikaelMayer So, disregarding what language it's currently written in, what language do you prefer it to be written in, or are you ambivalent about that?

Disregarding the language our codebase is currently written in, I would prefer it to be written in Python.

keyboardDrummer · 2022-01-06T10:28:51Z

Test/diffs/difftester.py

@@ -0,0 +1,955 @@
+#!/usr/bin/env python3


Could you structure the code into multiple files, to make it easier to digest? For example, a separate file for each supported front-end might help, or for each supported type of input.

I have mixed feelings about this: there's usefulness to having everything in one file. I agree that 1000 lines is getting close to the manageable limit. Happy to refactor if you think the benefits overweight the downsides.

If everything is in one file, how do I see the hierarchical structure of the code? Maybe an IDE with an outline view can show the classes in the file, but that's not as accessible as having the files define that structure. What's the benefit of a single file?

If everything is in one file, how do I see the hierarchical structure of the code?

You search for classes; there's 10 to 20 of them in here, so that remains manageable. The advantage of a single file is that you don't need an IDE to navigate around; all relevant definitions are in the same place, so you can just search around.

I personally prefer a big file than many small ones, because indeed it makes it easier to search around for the concepts, especially in a script that is doing only one thing and where we don't need to share code outside.

However, to keep it readable, since the classes might be for different purposes, I would advise to add a comment on the top of the file explaining what are all the classes that were defined and what purpose they have, so that it will make it easier to maintain down the road.

FWIW, I have a slightly different perspective: I think the cost of rewriting in C# is not huge, so I can do it if we find it useful. It's probably a day of two at most. But how about we first give this script a bit of a try, and if it turns out to be actually useful, we can decide whether to invest more in it.

I agree with this.

But how about we first give this script a bit of a try, and if it turns out to be actually useful, we can decide whether to invest more in it.

What do you mean by giving it a try? If we turn it on in the build-server then we're using it right? What would be a reason to undo that then?

I think normally we assume checked in code is there to stay, and the time of merging is it also the right time to optimise it for readability. In my opinion, having code in anything other than the primary language of a codebase makes it much less readable.

Already for this PR I don't feel as capable as a reviewer as I would for a C# PR.

MikaelMayer · 2022-01-07T15:49:38Z

Since it's in the Test/ directory, I don't see how it's looking up for the Binaries directory. Do you have to execute it from the Binaries/ directory?

It uses whichever binaries you give it. The example ~~assumes~~ assumed that you had Dafny on your path, but typically I just pass it a relative path.

I don't have Dafny on my path. Could you have your script go to ../Binaries to lookup for the applications if it's not in the path? And also, please add this information to the doc. so that we don't need to specify relative paths.

cpitclaudel · 2022-12-22T00:43:52Z

Closing this since the implementation had bit-rotted.

cpitclaudel added 24 commits January 4, 2022 14:05

fix(ast): Correct a comment in DafnyAst.cs

44ee1a9

difftests: Check in diff tester

1e94c7e

difftests: Add support for replaying sequences of snapshots with LSP

36e951c

difftests: Implement a simple function to compare LSP and CLI outputs

9290933

difftests: Add a simple CLI to difftester.py

aa9ed2a

difftests: Change to a streaming API for differential test results

ae5b143

difftests: Run CLI tests concurrently in differential tester

f0d0842

difftests: Add a timeout flag

04ece3d

difftests: Improve handling of C-c on Windows

5ac2a05

difftests: Allow users to pass -- args to drivers by /-prefixing them

d6f783c

difftests: Add more type annotations

61a1250

difftests: Add more type annotations

8128c75

difftests: Improve error reporting

43f0429

difftests: Future-proof against incremental diagnostics

8f2553d

difftests: Support verifying a single snapshot

0c13e7d

difftests: Log outputs

1061217

difftests: Reformat whole file

a775a81

difftests: Finer-granularity logging

9426e93

difftester: Add support for replaying LSP traces

395b94c

difftests: Reconstruct structure in CLI output to make comparison easier

fafb8ac

difftests: Return a non-zero exit code if mismatches occur

da79bb3

difftests: Adjust PID in LSP initialize request

c49aeae

difftests: Improve error reporting

87b47e7

difftests: Improve documentation and rename "driver" to "frontend"

7d1ad7b

cpitclaudel self-assigned this Jan 4, 2022

cpitclaudel added area: error-reporting Clarity of the error reporting part: language server Support for LSP in Dafny (server part; client is in ide-vscode repo) labels Jan 4, 2022

cpitclaudel marked this pull request as draft January 4, 2022 22:23

MikaelMayer requested changes Jan 5, 2022

View reviewed changes

difftests: Improve documentation

cce8616

keyboardDrummer reviewed Jan 6, 2022

View reviewed changes

difftests: Improve diffs and set default error level on CLI

f1b007f

cpitclaudel closed this Dec 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[draft] Automated differential testing for the LSP server #1684

[draft] Automated differential testing for the LSP server #1684

cpitclaudel commented Jan 4, 2022 •

edited by MikaelMayer

Loading

MikaelMayer commented Jan 5, 2022

MikaelMayer Jan 5, 2022

cpitclaudel Jan 5, 2022

cpitclaudel commented Jan 5, 2022

keyboardDrummer Jan 6, 2022

cpitclaudel Jan 6, 2022

keyboardDrummer Jan 7, 2022

MikaelMayer Jan 7, 2022

keyboardDrummer Jan 10, 2022

MikaelMayer Jan 10, 2022

keyboardDrummer Jan 11, 2022 •

edited

Loading

MikaelMayer Jan 11, 2022

keyboardDrummer Jan 11, 2022

MikaelMayer Jan 11, 2022

keyboardDrummer Jan 6, 2022

cpitclaudel Jan 6, 2022

keyboardDrummer Jan 6, 2022

cpitclaudel Jan 6, 2022

MikaelMayer Jan 7, 2022

cpitclaudel Jan 11, 2022

MikaelMayer Jan 11, 2022

keyboardDrummer Jan 11, 2022

MikaelMayer commented Jan 7, 2022

cpitclaudel commented Dec 22, 2022

		@@ -0,0 +1,955 @@
		#!/usr/bin/env python3
		r"""A differential tester for Dafny.

[draft] Automated differential testing for the LSP server #1684

[draft] Automated differential testing for the LSP server #1684

Conversation

cpitclaudel commented Jan 4, 2022 • edited by MikaelMayer Loading

MikaelMayer commented Jan 5, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cpitclaudel commented Jan 5, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

keyboardDrummer Jan 11, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MikaelMayer commented Jan 7, 2022

cpitclaudel commented Dec 22, 2022

cpitclaudel commented Jan 4, 2022 •

edited by MikaelMayer

Loading

keyboardDrummer Jan 11, 2022 •

edited

Loading