Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Trivia stored in the AST #1801

Merged
merged 60 commits into from
Jul 19, 2022
Merged

feat: Trivia stored in the AST #1801

merged 60 commits into from
Jul 19, 2022

Conversation

MikaelMayer
Copy link
Member

@MikaelMayer MikaelMayer commented Feb 8, 2022

This PR brings the following changes to Dafny

  • Replace the use of Boogie.IToken by a derivative Dafny.IToken, and same for Token (no need to change Boogie like in a previous attempt)
  • Adds and populate two new fields for every token leadingTrivia and trailingTrivia
  • Makes the first token of every declaration available through FirstDeclarationToken, so that it's possible to recover the trivia before it
  • Makes the token associated to docstring (new !) available through TokenBeforeDocstring, so that it's possible to recover the trivia after it.

This fixes #1528 for all trivia technically accessible a token stored in the AST.

Implementation details:
I ignored the following very relevant advice here:
https://github.com/terrajobst/minsk/blob/master/docs/episode-24.md
because we will visibly put comments after definitions, except when using /** */ before a top-level definition and separated by at least an empty newline

Work in Progress

This PR is the first step towards having a full Dafny formatter.

What to review

This PR seems huge, but there isn't as much to review. Here is what to look for, in the same order as when you click on the "files" tab. In bold, files that are important reviewing. Other files are just automated refactorings.

  • RELEASE_NOTES.md
  • Source/Dafny/AST/DafnyAst.cs Added classes IToken and Token, and field FirstDeclarationToken to declarations
  • Source/Dafny/AST/Printer.cs Just updated IToken and Token
  • Source/Dafny/Compilers/Compiler-Csharp.cs Bpl.Token => Token and same for IToken
  • Source/Dafny/Compilers/Compiler-cpp.cs Same boring stuff
  • Source/Dafny/Compilers/Compiler-go.cs Same
  • Source/Dafny/Compilers/Compiler-java.cs Same
  • Source/Dafny/Compilers/Compiler-js.cs Same
  • Source/Dafny/Compilers/Compiler-python.cs Same
  • Source/Dafny/Compilers/SinglePassCompiler.cs Same
  • Source/Dafny/Dafny.atg I used the declaration modifiers to set a FirstToken once that is then used by various fields FirstDeclarationToken
  • Source/Dafny/DafnyOptions.cs Bpl.Token => Token
  • Source/Dafny/RefinementTransformer.cs Same
  • Source/Dafny/Rewriter.cs Same
  • Source/Dafny/Verifier/BoogieStmtListBuilder.cs Explicited the use of Boogie token
  • Source/Dafny/Verifier/CaptureStateExtensions.cs Bpl.Token => Token
  • Source/Dafny/Verifier/Translator.BoogieFactory.cs Definition of the method Translator.ToDafnyToken used when doing roundtrips with translating to Boogie
  • Source/Dafny/Verifier/Translator.ClassMembers.cs Just three uses of splitExpr.Tok (defined later)
  • Source/Dafny/Verifier/Translator.ExpressionTranslator.cs Refactoring to use splitExpr.Tok, ToDafnyToken and use of Dafny Token.
  • Source/Dafny/Verifier/Translator.TrStatement.cs Same
  • Source/Dafny/Verifier/Translator.cs Same
  • Source/DafnyLanguageServer/Language/AssertionBatchCompletedObserver.cs One IToken migrated
  • Source/DafnyLanguageServer/Language/BoogieExtensions.cs IToken migrated
  • Source/DafnyLanguageServer/Language/DiagnosticErrorReporter.cs Same
  • Source/DafnyLanguageServer/Language/GhostStateDiagnosticCollector.cs Same
  • Source/DafnyLanguageServer/Language/Symbols/DafnyLangSymbolResolver.cs Migrated a token.
  • Source/DafnyLanguageServer/Language/SyntaxTreeVisitor.cs One token migration
  • Source/DafnyLanguageServer/Workspace/VerificationProgressReporter.cs A few migrations
  • Source/DafnyServer/SymbolTable.cs Easy migration
  • Source/DafnyTestGeneration/ProgramModifier.cs Token disambiguation
  • third_party/Coco/src/Scanner.frame Change to the scanner so that it can attach trivia to tokens. Note that the first token takes all the previous trivia.

@MikaelMayer MikaelMayer self-assigned this Feb 8, 2022
@MikaelMayer MikaelMayer changed the title feat: Add trivia to tokens feat: Make documentation available on hover Jun 22, 2022
@MikaelMayer MikaelMayer marked this pull request as ready for review June 22, 2022 21:04
) {
if(triviaTrailing && (ch == '\r' || ch == '\n')) {
if(ch == '\r' || (ch == '\n' && !lastWasCr)) { // Don't count \r\n as two newlines.
triviaNewlines++;
Copy link
Member

@keyboardDrummer keyboardDrummer Jul 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the definition of trailing trivia in this PR? Shouldn't any newlines set triviaTrailing = false; ?

For example, if I have a statement block:

var y := 4; // comment about y;
var x := 3; // comment about x;
if (x == 3) {
  // only usage of y;
}

And I do a refactoring on y called "move to innermost scope" that moves the declaration inside the if, then we should end up with:

var x := 3; // comment about x;
if (x == 3) {
  var y := 4; // comment about y;
  // only usage of y;
}

How does the trivia parsing in this PR achieve that?

I get the impression that you changed the definition of trailing trivia because of what we want for docstrings, but I don't understand why that was necessary.

I thought a docstring is any block comment that occurs after the signature of the function/method (callable?), which according to our original definition of leading and trailing trivia, would be leading trivia of one of the tokens belonging to the callable, like ensures, requires or {.

@keyboardDrummer
Copy link
Member

Mmmmh in our case that would me we would need a new concrete syntax tree node called MultipleFieldDecl. I would prefer to have the AST be aware of how it was built so that we don't need an extra rewrite step.

Is there a downside to having that extra rewrite step?

///
/// ```
/// const /*for const*/ x /*for x*/ := /* for := */ 1/* for 1 */
/// // for 1 again
Copy link
Member

@keyboardDrummer keyboardDrummer Jul 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example makes me doubtful about our choices.

Since we decided docstrings on const declarations should occur after the const, // for 1 again is a docstring for the const right? So the implementation for finding the docstrings for constants is then:

Const.Docstring => Const.DeepChildren.Last().TrailingTrivia

It seems inconsistent to me to have the docstring for a node not be attached to that node. Suppose you have

const x := 3 + 42 // docstring for x

and then I use an "extract variable" refactoring on 42 and name the new variable magicNumber, we get:

const magicNumber := 42 // docstring for x
const x := 3 + magicNumber

That's not intuitive right?

What if we let docstrings occur after the signature but before the body, also for consts, in which case it would occur right before the :=. If we do that, then we could also use our original definition of leading and trailing trivia, where comments generally attach to the current or next node, but not the previous one, which I feel is more intuitive.

Then we get

const x /* docstring for x */ := 3 + 42

or

const x 
// really elaborate 
// docstring for x
  := 3 + 42

and refactoring anything in the body will keep the docstring in the same place.

@cpitclaudel could you chime in what behavior we want regarding what nodes trivia attach to?

For example, for

var x := 1;
// which statement does this comment belong to?
var y := 2;

/// All the remaining trivia is for the LeadingTrivia of the next token
///
/// ```
/// const /*for const*/ x /*for x*/ := /* for := */ 1/* for 1 */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's more import to know which AST nodes the trivia attach to than which tokens, so I would replace /* for := */ with `/* for const */

@@ -24,32 +24,56 @@ public class FilledInDuringResolutionAttribute : System.Attribute { }
public interface IToken : Microsoft.Boogie.IToken {
// Inherited for now.
Copy link
Member

@keyboardDrummer keyboardDrummer Jul 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can drop this comment and the subsequent block comment.

int col { get; set; }
int line { get; set; }
string val { get; set; }
int kind { get; set; } // Used by coco, so we can't rename it to Kind
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move these comments so they occur on the lowercase properties?

@MikaelMayer MikaelMayer enabled auto-merge (squash) July 14, 2022 19:37
@MikaelMayer MikaelMayer merged commit f0b25e9 into master Jul 19, 2022
@MikaelMayer MikaelMayer deleted the feat-add-trivia branch July 19, 2022 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add trivia (whitespace, comments, and other non-semantic data) to the AST
2 participants