Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect AST for lists #457

Closed
psobolik opened this issue Aug 21, 2024 · 3 comments
Closed

Incorrect AST for lists #457

psobolik opened this issue Aug 21, 2024 · 3 comments

Comments

@psobolik
Copy link

There seems to be a problem with the AST Comrak's parse_document function generates for lists. Specifically, the value of the NodeList object's marker_offset isn't set as it should be for sub-lists.

The expected behavior is that in a Markdown list, a line with an item marker that is indented by some multiple of two spaces will start a sub-list, and that that sub-list's marker_offset will indicate the level.

As an example of unexpected AST, the following Markdown will correctly result in three lists, but the value in each one's marker_offset will be zero.

+ First item - no indent
+ Second item - no indent
  * First sub-item - indent two spaces
  + Second sub-item - indent two spaces
    - First sub-sub-item - indent four spaces
      + Fist sub-sub-sub-item - indent eight spaces
+ Third item - no indent

After some experimenting, it appears that some number of leading spaces can be used in the sublists that will result in the library generating the expected values: To start the first sub-level in a list (marker_offset == 1) a line must be indented three spaces; to start a second level (marker_offset == 2) requires seven spaces, and the forth level (marker_offet == 3) needs 12. It doesn't look like any number of leading spaces will result in a marker_offset greater than 3.

Here's an example:

+ First item - no indent
+ Second item - no indent
   * First sub-item - indent three spaces
   + Second sub-item - indent three spaces
       - First sub-sub-item - indent seven spaces
            + First sub-sub-sub-item - indent 12 spaces
+ Third item - no indent
@kivikakk
Copy link
Owner

This is a bit unfortunate; the field description is a bit of a misnomer. We decided to publicise node data private fields in #216(/#215), since it was otherwise impossible to programmatically construct an AST.

But the one-line description of marker_offset fails to capture it: it's not really meant to document exactly how many spaces the marker is offset from the absolute left-hand side of the document; it's used by the parser itself during parse, which explains the values you see. The offset is relative to the first column a marker could be in while still qualifying as a list marker, hence only values from 0 to 3: 4 spaces would instead create an indented code block.

(Thus your first example gives all zeroes since each (sub)list is as far to the left as it could be while representing the same structure.)

Open to suggestions for a better comment describing the field, but as the value is depended-upon (primarily by Parser::parse_node_item_prefix, which determines whether a line's content should be ascribed to an already-open list), it's unlikely to change.

The AST does not currently store enough information to recover what you're looking for1: it is not a concrete syntax tree, although that's increasingly being called for!

Footnotes

  1. except by summing 2+marker_offset for each nested list; this won't work for ordered lists, and will probably fall over if lists are within any other block-level element.

@psobolik
Copy link
Author

psobolik commented Aug 22, 2024

Thanks for the clarification. I guess the upshot is that unless you're generating HTML you need to keep track of state (the depth of lists) while walking the tree, because the nodes don't have sufficient information by themselves. I was misled by the fact that the marker_offset seems to indicate depth in some cases--at least in some of the ones I looked at.

@kivikakk
Copy link
Owner

Yep!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants