proposal: go/ast: add CommentGroup.Directives iterator #68021

jimmyfrasche · 2024-06-15T18:55:14Z

Proposal Details

*ast.CommentGroup has a helpful Text method to extract the text of comments that are not directives, but there is no simple means to get the directives of the comment other than by parsing the raw comments.

Now that directives are standardized and allow third party directives, while the format is simple, it would be nice to make them easier to access.

The most common case would be for a third party tool to want to see only the directives in its namespace so this use case should be made simplest.

I imagine something like:

// Directives iterates over all
//
//    //namespace:directive arguments
//
// directives in the comment group,
// yielding namespace:directive then arguments (which may be empty).
//
// If a namespace argument is provided, only directives that match
// that namespace are iterated over.
//
// If the namespace argument is go, the directives iterated over
// include directives like "//line" which predate the standardized format.
//
// If namespace is the empty string, all directives are iterated over
func (g *CommentGroup) Directives(namespace string) iter.Seq2[string, string]

The text was updated successfully, but these errors were encountered:

jimmyfrasche · 2024-06-15T18:55:58Z

cc @griesemer per owners

jimmyfrasche · 2024-06-16T19:30:08Z

Perhaps it should be Directives(namespaces ...string). That would make it cleaner in the rare case you want all directives and, more importantly, simpler in the case where you want your namespace but also need to, for example, account for directives from another tool that you're replacing or just need to interoperate with.

griesemer · 2024-06-17T16:22:25Z

cc @findleyr @adonovan for visibility.

findleyr · 2024-06-17T16:38:20Z

CC @lfolger, since we were just discussing the lack of an API like this one.

At a high level I do think we should expose this in an API. The proposal looks reasonable to me. I suspect that the variadic form is overkill, preferring the original proposal, but I could be convinced otherwise.

adonovan · 2024-06-17T17:04:30Z

I'm not convinced the namespaces argument is necessary: the iterator must always visit every node, so there's no efficiency gain by having the iterator perform the filtering, and the non-monotonic behavior for len(namespaces)=0 seems undesirable. Make the iterator yield them all, and let the caller filter.

Perhaps we should define a new type, type Directive { Namespace, Name, Arguments string }, and just return a plain Seq[Directive]. That alleviates the caller from thinking about parsing the first element, or from getting confused as to how the three values are split into the two parts of a Seq2.

jimmyfrasche · 2024-06-17T17:34:50Z

👍 on a new type. Though maybe it should just be type Directive { Namespace, Name string } and iter.Seq2[Directive, string] for the arguments? That would make it simpler to use the directive as a map key.

The variadic proposal may be overkill. I'm not entirely confident. The main argument is performance, as I imagine the iterator would parse the comments for each invocation. If you do need to check multiple namespaces you'd either need to parse the comments n times or iterate over all the directives and implement your own filtering logic and I based it off #67795 so I figured the non-monotonic behavior would be acceptable. OTOH a multi-namespace filter may need to do some allocations or preprocessing and those would likely be the same for the entire program so having it as a separate filter would

I do think the most common case is going to be a tool looking for its directives so having some simple namespace filtering built in is going to simplify the majority of callers who would otherwise all have to implement their own filtering. For example, the code I was writing that led me to file this only cares about two directives in its custom namespace so it would be a lot simpler to just write g.Directives("mystuff") than

for dir := range g.Directives() {
  if dir.Namespace != "mystuff" {
    continue
  }
 // ...
}

Built in filtering can also handle the special cases of extern, export, and line, which have no namespace but you'd want to show up in a query for the "go" namespace. Of course, if there's a type it could export a method to handle this.

adonovan · 2024-06-17T17:41:44Z

That would make it simpler to use the directive as a map key.

But it's not a map, it's a sequence of pairs whose keys may be duplicates.

jimmyfrasche · 2024-06-17T17:52:47Z

I mean that if I wanted to collect results as map[Directive][]T where type T struct{ Node; args string} and Arguments is part of Directive I'd need to shuffle things around but if they're separated it's more straightforward.

adonovan · 2024-06-17T17:57:54Z

I mean that if I wanted to collect results as map[Directive][]T where type T struct{ Node; args string} and Arguments is part of Directive I'd need to shuffle things around but if they're separated it's more straightforward.

That's true, but most clients will discard most directives (at least ones of the wrong namespace), so a map[string][]T will do, and three-field Directive struct gives us room for future improvements (e.g. a method to parse arguments in some emerging future standard way).

jimmyfrasche · 2024-06-17T18:28:48Z

For export, extern, and line, I think the Namespace field of Directive should be set to "go". Fudging that removes an edge case. There could be a String method that renders them without the go:

jimmyfrasche · 2024-06-18T22:27:03Z

The updated proposal so far is:

// Directives yields each [directive comment] in g as a [Directive].
//
// [directive comment]: https://go.dev/doc/comment#syntax
func (g *CommentGroup) Directives() iter.Seq[Directive] {
	// ...
}

type Directive struct {
	Namespace, Name, Arguments string
}

For line/export/extern directives, there are two choices:

record Namespace as ""
record Namespace as "go"

1 is syntactically correct. There is no namespace for such directives as they are written.

2 is semantically correct. Those directives belong to Go and, as such, implicitly belong to the "go" namespace.

These directives need to be special cased in parsing so manually setting the namespace to "go" is trivial.

For rendering to a string, these need to be special cased either way (to avoid rendering ":line" or "go:line" instead of "line") and as such Directive should be a fmt.Stringer.

So for std these need to be special cased and documented either way.

For user code, 2 removes a special case as Namespace != "" and everything that belongs to Go is labeled "go".

However, it's unlikely that anyone would care about anything other than a small fixed set of namespaces, often a singleton, and it's likely that this set contains neither "go" nor "", so few users would even be in a situation where they could run into this.

These are both right. 2 seems more right to me. But ultimately it doesn't seem likely to matter much in practice and should be documented whatever the decision.

jimmyfrasche · 2024-06-19T01:38:34Z

It looks like Arguments is pretty free form. Are the following correct or did I misunderstand something:

Directive → Arguments (notes):

"//0:0 a \n" → " a " (two spaces on either side)

"//a:aA\n" → "A"

"//line X\n" → "X" (first space is part of name)

If possible, it would be nice to further standardize that leading and trailing white space is ignored (and have gofmt normalize it to a single space). I can open another issue, if that's a possibility.

edit, the respective hypothetical normalizations would be:

"//0:0 a \n" → "//0:0 a\n"

"//a:aA\n" → "//a:a A\n"

edit 2 corrected //a:A example to //a:aA

jimmyfrasche · 2024-06-19T02:09:34Z

I filed #68061 to simplify the namespace question I posted earlier today by litigating it out of existence

lfolger · 2024-06-19T05:45:18Z

I'm not sure I understand

"//a:A\n" → "A"

I would expect it to lead to

Directive{Namespace: "a", Name: "A", Arguments: ""}

Why Is "A" and argument and not the name?
Is this to make the normalization easier?

jimmyfrasche · 2024-06-19T17:29:05Z

That was a bug in the comment. Updated it to:

"//a:aA\n" → "//a:a A\n"

The A isn't lowercase so it's not part of the directive name so it's part of the arguments. Allowing and normalizing leading whitespace would make that clearer.

jimmyfrasche · 2024-06-20T19:46:18Z

I went ahead and filed #68092 for the argument whitespace, if it's possible to change that.

jimmyfrasche · 2024-06-20T21:42:32Z

Summary of how these proposals fit together.

This proposal is for:

func (g *CommentGroup) Directives() iter.Seq[Directive] {
	// ...
}

type Directive struct {
	Namespace, Name, Arguments string
}

If #68061 is accepted, then it's invariant that Namespace != "" and all the Go directives get Namespace == "go"

If #68092 is accepted, then it's invariant that Arguments == strings.TrimSpace(Arguments)

With both accepted, any Directive can be turned back into a string with fmt.Sprintf("%s:%s %s", d.Namespace, d.Name, d.Arguments); otherwise, Directive will require a String method to handle the various special cases (though it could certainly still have a String method).

jimmyfrasche · 2024-06-20T22:24:09Z

Should it be Argument singular since there's only one of them?

rsc · 2024-07-25T09:26:40Z

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

rsc · 2024-08-07T17:50:09Z

I don't think we need the filter as an argument. We also need to add file:line position information in some form (token.Pos or whatever is natural), which basically requires a struct at that point.

https://go.dev/doc/comment#Syntax says "//toolname:directive".

So something like:

type Directive struct {
    Pos token.Pos
    Tool string
    Name string
    Args string
}

(type Directive should not have field Directive, hence Name)

gopherbot · 2024-08-14T18:54:28Z

Change https://go.dev/cl/605517 mentions this issue: go/ast: add (*CommentGroup).Directives() iterator

adonovan · 2024-09-12T16:37:56Z

I found (only) one place in x/tools that would want to use this new API: extractMagicComments in gopls/internal/cache/snapshot.go, which would change from:

var buildConstraintOrEmbedRe = regexp.MustCompile(`^//(go:embed|go:build|\s*\+build).*`)

func extractMagicComments(f *ast.File) []string {
	...
	for _, cg := range f.Comments {
		for _, c := range cg.List {
			if buildConstraintOrEmbedRe.MatchString(c.Text) {
				results = append(results, c.Text)

to:

func extractMagicComments(f *ast.File) []string {
	...
	for _, cg := range f.Comments {
		for dir := range cg.Directives() {
			if dir.Tool == "go" && (dir.Name == "embed" || dir.Name == "build") {
				result := fmt.Sprintf("%s:%s %s", dir.Tool, dir.Name, dir.Args)
				results = append(results, result)

The resulting code is unfortunately not shorter, clearer, or more efficient (indeed, the converse in all three dimensions). It could be improved with some non-local rethinking.

I'm not opposed to this proposal as it does provide canonical parsing for a standard data structure, but I don't think it helps much in practice.

jimmyfrasche · 2024-09-12T18:12:40Z

Personally, I wanted it for my own tools that need to filter to declarations with my directives and that's largely straightforward except for figuring out what I needed to do to match directives properly. If this iterator had existed I would have just used it and not had to do a whole side quest.

adonovan · 2024-09-16T16:35:50Z

On further reflection, I don't think there's a compelling need for an iterator here, as the sequence is typically very short (usually zero, occasionally one) and generally the client will not break out of the loop. A simple []Directive will do.

jimmyfrasche · 2024-09-16T23:25:21Z

Even then most won't be the particular directive(s) a particular tool is looking for and will end up getting discarded so an iterator would avoid some small allocations. I don't really see either an iterator or a slice being a clear winner. My instinct is to default to an iterator when there's a draw but I'd be fine with a slice.

adonovan · 2024-09-17T18:45:46Z

OK. In that case, the current proposal is:

package ast // "go/ast"

// Directives returns a slice of directives in the comment group.
func (g *CommentGroup) Directives() []Directive

// A Directive is a comment of this form:
//
//    //tool:name args
//
// For example, this directive:
//
//     //go:generate stringer -type Op -trimprefix Op
//
// would have Tool "go", Name "generate", and Args "stringer -type Op -trimprefix Op".
// See https://go.dev/doc/comment#Syntax for specification.
type Directive struct {
    Pos  token.Pos // position of start of line comment
    Tool string // may be "" if Name is "line", "extern", "export"
    Name string
    Args string // may contain whitespace
}

aclements · 2024-09-18T17:41:00Z

This seems like the right API, but there's still a question of whether this is well-motivated. @adonovan is going to look into the standard library to find potential use sites for this and make sure this simplifies those uses.

jimmyfrasche · 2024-09-18T18:11:48Z

@aclements my primary motivation was to make it simpler for third parties to use directives. Even if it can't be used at all internally for whatever reason it's still sufficiently useful. (If it can be used internally that's a nice bonus, of course)

aclements · 2024-09-18T18:41:09Z

@jimmyfrasche I basically agree. I mostly want to see some concrete evidence that this API actually simplifies code that needs to read directives. I don't expect there to be many examples from std, which is totally fine, it's just a handy "unbiased" source of test cases. The one example @adonovan posted above seems like kind of a wash.

jimmyfrasche · 2024-09-18T19:12:05Z

One of the problems with comparing it to existing code is all existing code is going to be doing the parsing and filtering in one step: it parses only one particular directive instead of parsing all directives then selecting the one of interest. That's why the initial proposal had a built in filtering mechanism.

With the filtering mechanism back in place, the code posted above would be somewhat simpler

func extractMagicComments(f *ast.File) []string {
	...
	for _, cg := range f.Comments {
		for dir := range cg.Directives("go:embed", "go:build") {
			result := fmt.Sprintf("%s:%s %s", dir.Tool, dir.Name, dir.Args)
			results = append(results, result)

(and simpler still if Directive had a String() string method)

Although both rewrites are incorrect as the original also looked for +build and included // in the output.

That said maybe the solution is to go lower level and just have a:

func ParseDirective(text string) (Directive, bool)

aclements · 2024-09-25T17:59:02Z

@gri points out that ParseDirective has the advantage that it can work on any text, while CommentGroup.Directives requires you to have a CommentGroup, which makes ParseDirective more composable.

CommentGroup already splits //-style comments into lines, so the fact that ParseDirective only returns a single directive doesn't seem to add any burden to the caller. And /*-style comments aren't allowed to contain directives.

Can we specify exactly what ParseDirective accepts?

jimmyfrasche · 2024-09-25T18:29:04Z

I'd been working on the assumption that they only go in // but can't line directives be in /**/ comments? If so, it might be fine to just say "does not work with (all) line directives". Since it's purpose is to work with custom directives maybe it'd be fine to further limit it to only parsing the new x:y directives? That would simplify a lot of things for little loss.

After looking back, I'm not sure the new directive format is entirely specified. For the x:y family what's specified is only how to tell if a line is a directive but nothing past that point. In x:yz A the format stops at y and strictly speaking doesn't say whether z is part of the name or A is an argument: it just says, yup, what we have here is a directive.

jimmyfrasche · 2024-09-25T18:46:30Z

Ignoring everything else, if the solution is ParseDirective there should be an example using it with CommentGroup

jimmyfrasche added the Proposal label Jun 15, 2024

jimmyfrasche added this to the Proposal milestone Jun 15, 2024

jimmyfrasche mentioned this issue Jun 19, 2024

proposal: go/format: simplify old directive comments #68061

Closed

jimmyfrasche mentioned this issue Jun 20, 2024

proposal: go/doc: ignore leading and trailing whitespace in the directive arguments #68092

Open

griesemer assigned adonovan and griesemer Aug 14, 2024

adonovan unassigned griesemer Aug 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposal: go/ast: add CommentGroup.Directives iterator #68021

proposal: go/ast: add CommentGroup.Directives iterator #68021

jimmyfrasche commented Jun 15, 2024

jimmyfrasche commented Jun 15, 2024

jimmyfrasche commented Jun 16, 2024

griesemer commented Jun 17, 2024

findleyr commented Jun 17, 2024

adonovan commented Jun 17, 2024 •

edited

Loading

jimmyfrasche commented Jun 17, 2024

adonovan commented Jun 17, 2024

jimmyfrasche commented Jun 17, 2024

adonovan commented Jun 17, 2024

jimmyfrasche commented Jun 17, 2024

jimmyfrasche commented Jun 18, 2024

jimmyfrasche commented Jun 19, 2024 •

edited

Loading

jimmyfrasche commented Jun 19, 2024

lfolger commented Jun 19, 2024

jimmyfrasche commented Jun 19, 2024

jimmyfrasche commented Jun 20, 2024

jimmyfrasche commented Jun 20, 2024

jimmyfrasche commented Jun 20, 2024

rsc commented Jul 25, 2024

rsc commented Aug 7, 2024

gopherbot commented Aug 14, 2024

adonovan commented Sep 12, 2024 •

edited

Loading

jimmyfrasche commented Sep 12, 2024

adonovan commented Sep 16, 2024

jimmyfrasche commented Sep 16, 2024

adonovan commented Sep 17, 2024 •

edited

Loading

aclements commented Sep 18, 2024

jimmyfrasche commented Sep 18, 2024

aclements commented Sep 18, 2024

jimmyfrasche commented Sep 18, 2024

aclements commented Sep 25, 2024

jimmyfrasche commented Sep 25, 2024

jimmyfrasche commented Sep 25, 2024

proposal: go/ast: add CommentGroup.Directives iterator #68021

proposal: go/ast: add CommentGroup.Directives iterator #68021

Comments

jimmyfrasche commented Jun 15, 2024

Proposal Details

jimmyfrasche commented Jun 15, 2024

jimmyfrasche commented Jun 16, 2024

griesemer commented Jun 17, 2024

findleyr commented Jun 17, 2024

adonovan commented Jun 17, 2024 • edited Loading

jimmyfrasche commented Jun 17, 2024

adonovan commented Jun 17, 2024

jimmyfrasche commented Jun 17, 2024

adonovan commented Jun 17, 2024

jimmyfrasche commented Jun 17, 2024

jimmyfrasche commented Jun 18, 2024

jimmyfrasche commented Jun 19, 2024 • edited Loading

jimmyfrasche commented Jun 19, 2024

lfolger commented Jun 19, 2024

jimmyfrasche commented Jun 19, 2024

jimmyfrasche commented Jun 20, 2024

jimmyfrasche commented Jun 20, 2024

jimmyfrasche commented Jun 20, 2024

rsc commented Jul 25, 2024

rsc commented Aug 7, 2024

gopherbot commented Aug 14, 2024

adonovan commented Sep 12, 2024 • edited Loading

jimmyfrasche commented Sep 12, 2024

adonovan commented Sep 16, 2024

jimmyfrasche commented Sep 16, 2024

adonovan commented Sep 17, 2024 • edited Loading

aclements commented Sep 18, 2024

jimmyfrasche commented Sep 18, 2024

aclements commented Sep 18, 2024

jimmyfrasche commented Sep 18, 2024

aclements commented Sep 25, 2024

jimmyfrasche commented Sep 25, 2024

jimmyfrasche commented Sep 25, 2024

adonovan commented Jun 17, 2024 •

edited

Loading

jimmyfrasche commented Jun 19, 2024 •

edited

Loading

adonovan commented Sep 12, 2024 •

edited

Loading

adonovan commented Sep 17, 2024 •

edited

Loading