Skip to content

flex-development/docast

Repository files navigation

docast

github release npm module type: esm license conventional commits typescript vitest yarn

Docblock Abstract Syntax Tree.


docast is a specification for representing docblock comments as abstract syntax trees.

It implements the unist spec.

Contents

Introduction

This document defines a format for representing docblock comments as abstract syntax trees. Development of docast started in October 2022. This specification is written in a TypeScript-like grammar.

Where this specification fits

docast extends unist, a format for syntax trees, to benefit from its ecosystem of utilities. It also integrates with mdast, a specification for representing markdown.

docast relates to JavaScript and TypeScript in that both languages support docblock comments. docast is language-agnostic, however, and can be used with any programming language that supports docblock comments.

docast relates to JSDoc, TSDoc, and typedoc in that these tools parse docblock comments. These tools also have a limited set of tags that developers are allowed to use. If developers already have a set of tags they're using, they must spend additional time re-configuring those tags for their chosen tool. docast does not enforce any tag semantics — the user does. Tag specifications can be left to an ESLint rule or setting akin to jsdoc/check-tag-names or jsdoc.structuredTags.

Types

TypeScript users can integrate docast type definitions into their project by installing the appropriate packages:

yarn add @flex-development/docast @types/mdast @types/unist

Nodes (abstract)

Node

interface Node extends unist.Node {
  position?: Position | undefined
}

Node (unist.Node) is a syntactic unit in docast syntax trees.

The position field represents the location of a node in a source document. The value of the position field implements the Position interface. The position field must not be present if a node is generated.

Position

interface Position {
  end: Point
  start: Point
}

Position represents the location of a node in a source file.

The start field of Position represents the index of the first character of the parsed source region. The end field represents the index of the first character after the parsed source region, whether it exists or not. The value of the start and end fields implement the Point interface.

If the syntactic unit represented by a node is not present in the source file at the time of parsing, the node is said to be generated and it must not have positional information.

Point

interface Point {
  column: number // >= 1
  line: number // >= 1
  offset: number // >= 0
}

Point represents one place in a source file.

The line and column fields are 1-indexed integers representing a line and column in a source file. The offset field (0-indexed integer) represents a character in a source file.

The term character refers to a (UTF-16) code unit as defined by the Web IDL specification.

Literal

interface Literal extends Node {
  value: string
}

Literal represents an abstract interface in docast containing a value.

Its value field is a string.

Parent

interface Parent extends unist.Parent {
  children: (Comment | Content)[]
}

Parent (unist.Parent) represents an abstract interface in docast containing other nodes (said to be children).

Its content is limited to comment nodes and docast content.

Nodes

BlockTag

interface BlockTag extends Parent, Tag {
  children: BlockTagContent[]
  data?: BlockTagData | undefined
  type: 'block-tag'
}

BlockTag (Parent) represents top-level metadata.

BlockTag can be used in comment nodes. Its content model is block tag content.

Comment

interface Comment extends Parent {
  children: FlowContent[]
  code?: CodeSegment | null | undefined
  data?: CommentData | undefined
  type: 'comment'
}

Comment (Parent) represents a docblock comment in a source file.

The code field represents the segment of code documented by a comment. The value of the code field may be null, undefined, or implement the CodeSegment interface. The code field must not be present if a comment is used only to provide additional information.

Comment can be used in root nodes. Its content model is flow content.

CodeSegment

interface CodeSegment {
  identifier: string
  kind: number | string
  parent?: CodeSegment | null | undefined
  position: Position
}

CodeSegment represents the code segment in a file that is documented by a comment.

The identifier field represents the name of documented code segment. The value of the identifier field is a non-empty string that matches the identifier found in the respective programming langauge's AST.

The kind field represents the syntax kind of the code segment. The value of the kind field is an enumerated value.

The parent field represents the code segment the current segment is nested under. The value of the parent field may be null or undefined for top-level code segments, or for nested code segments, implement the CodeSegment interface.

Description

interface Description extends Parent {
  children: DescriptionContent[]
  data?: DescriptionData | undefined
  type: 'description'
}

Description (Parent) represents the text of a comment. It is located at the start of a comment, before any block tags, and may contain Markdown content.

Description can be used in comment nodes. Its content model is description.

InlineTag

interface InlineTag extends Literal, Tag {
  data?: InlineTagData | undefined
  type: 'inline-tag'
}

InlineTag (Literal) represents inline metadata.

InlineTag can be used in block tag and description nodes. It cannot contain any children — it is a leaf.

Root

interface Root extends Parent {
  children: Comment[]
  data?: RootData | undefined
  position?: undefined
  type: 'root'
}

Root (Parent) represents a document.

Root can be used as the root of a tree, never as a child. It can contain comment nodes.

TypeExpression

interface TypeExpression extends Literal {
  data?: TypeExpressionData | undefined
  type: 'type-expression'
}

TypeExpression (Literal) represents a type defintion or constraint.

TypeExpression can be used in block tag nodes. It cannot contain any children — it is a leaf.

Mixins

Tag

interface Tag {
  name: string
  prefix: string
  tag: string
}

Tag represents metadata associated with a comment.

The prefix field represents the tag prefix. The value is a non-empty string.

The name field represents the tag name without prefix. The value of the name field is a non-empty string.

The tag field represents the parsed tag. The value of tag field is prefix and name.

Content model

type Content = BlockTagContent | DescriptionContent | FlowContent | PhrasingContent

Nodes are grouped by content type, if applicable. Each node in docast, with the exception of Comment, falls into one or more categories of Content.

BlockTagContent

type BlockTagContent = PhrasingContent | TypeExpression

Block content represents block tag text, and its markup.

DescriptionContent

type DescriptionContent =
  | mdast.Blockquote
  | mdast.Definition
  | mdast.FootnoteDefinition
  | mdast.List
  | mdast.ListItem
  | mdast.Paragraph
  | mdast.Table
  | mdast.ThematicBreak
  | PhrasingContent

Description content represents description text, and its markup.

FlowContent

type FlowContent = BlockTag | Description

Flow content represents the sections of comment.

PhrasingContent

type PhrasingContent = InlineTag | mdast.Code | mdast.PhrasingContent

Phrasing content represents comment text, and its markup.

Glossary

See the unist glossary for more terms.

Docblock comment

A specially formatted comment in a source file used to document a segment of code or provide additional information.

List of utilities

See the unist list of utilities for more utilities.

Contribute

See CONTRIBUTING.md.

Ideas for new utilities and tools can be posted in docast/ideas.

This project has a code of conduct. By interacting with this repository, organization, or community you agree to abide by its terms.