Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Table of contents: Avoid parsing the document using libxml in PHP #29560

Closed
aristath opened this issue Mar 4, 2021 · 5 comments · Fixed by #29739
Closed

Table of contents: Avoid parsing the document using libxml in PHP #29560

aristath opened this issue Mar 4, 2021 · 5 comments · Fixed by #29739
Labels
[Block] Table of contents (experimental) Affects the Table of contents Block [Type] Enhancement A suggestion for improvement. [Type] Performance Related to performance efforts

Comments

@aristath
Copy link
Member

aristath commented Mar 4, 2021

What problem does this address?

Right now the TOC block parses the content of the post on render, dynamically generating the list every time.
It works, but is a resources-intensive process and could be optimized.

What is your proposed solution?

Instead of building the TOC dynamically on render, we could do it via JS in the editor itself. This way it will be saved as a list and rendered a lot faster and efficiently.
The headings structure can be saved in the block attributes and updated on the fly as the post gets edited.
This would mean that the classic block will be ignored by the TOC block, but that is to be expected and in fact something we may want to enforce since users should be using the heading blocks to structure their content, and classic blocks can be converted to block-based content.

@aristath aristath added [Block] Table of contents (experimental) Affects the Table of contents Block [Type] Enhancement A suggestion for improvement. [Type] Performance Related to performance efforts labels Mar 4, 2021
@mtias
Copy link
Member

mtias commented Mar 4, 2021

I agree, this is a block that I strongly feel should be restricted to work with blocks alone, not attempt to work with freeform content. It's at the core of the value proposition of blocks to offer structured data that we can more easily reason about. We also have the right parsing tools for them, both on the client and the server.

Ideally, for me, the headings are pulled from just heading blocks at render time and assembled using the block APIs. Freeform content is skipped. There are too many concerns with trying to parse arbitrary html in search of headings, and it can crumble down on any long enough post.

@mcsf
Copy link
Contributor

mcsf commented Mar 5, 2021

I agree, this is a block that I strongly feel should be restricted to work with blocks alone, not attempt to work with freeform content. It's at the core of the value proposition of blocks to offer structured data that we can more easily reason about. We also have the right parsing tools for them, both on the client and the server.

Ideally, for me, the headings are pulled from just heading blocks at render time and assembled using the block APIs. Freeform content is skipped. There are too many concerns with trying to parse arbitrary html in search of headings, and it can crumble down on any long enough post.

You mirror my thoughts exactly.

I commented in #22874 (comment) about the fact that there are trade-offs to be made with any implementation, so we need to ensure we value the right things — like a block-based approach instead of a DOM-based one. Which brings me to #22874 (comment), in which I recommended starting with just Heading and only thereafter designing a decent API for third-party heading-like blocks to feed the table of contents.

Right now the TOC block parses the content of the post on render, dynamically generating the list every time.
It works, but is a resources-intensive process and could be optimized.

Yes, this is something that worries me too, not just from a computational standpoint, but also because of the reliance on libxml itself. It's heavy-handed, a bit fragile, not guaranteed to be supported* across an install base as large as WP's, and has opened the doors for multiple issues last year in the context of Block Supports. See the description (the diff itself is irrelevant) of #26111 for context. Since then we've moved away from libxml for that feature.

There is a common learning between Block Supports and the Table of Content blocks, though: it's that we tend to come up with more robust solutions when we err on the side of doing less and relying on blocks as the default semantic unit. In the case of Block Supports, that meant inverting control in the API, raising restrictions on what Block Supports should do, and focusing on blocks' attribute schemas. So, for the ToC block, let's take it from the top with the idea of heading-like blocks.

*: For instance, Jetpack checks for the existence of the DOMDocument and libxml_use_internal_errors interfaces and provides fallback functionality in certain areas.

@mtias
Copy link
Member

mtias commented Mar 5, 2021

only thereafter designing a decent API for third-party heading-like blocks to feed the table of contents.

Exactly, also because the right API for third party blocks might actually be to use heading block as children or variations.

@ZebulanStanphill
Copy link
Member

It's worth noting that I originally went with the dynamic rendering approach because altering the saved markup of the block in the editor created extra undo steps, since every change to a Heading would result in a subsequent change to the Table of Contents. However, I think the recently-introduced __unstableMarkNextChangeAsNotPersistent might be able to solve this problem.

@mtias I can think of one use-case that might be difficult to accomplish if the core Heading block was the only supported source of headings: accordions. The proper semantic markup for an accordion is as follows:

<div>
	<h2>
		<button>Accordion title... the button MUST be nested INSIDE the heading, and the button should contain the entire content of the title.</button>
	</h2>
	<div>Accordion content...</div>
</div>

Can this be accomplished using block variations?

If we decide to only support the core Heading block, we should probably make an announcement about it. There are several existing popular plugins that add their own heading blocks (e.g. Kadence Blocks and Getwid), and we should probably let them know that what they're doing isn't supported, if we choose to go that route. Because as things currently are in the editor, trying to use any non-core Heading block already results in a somewhat broken experience when trying to use the Document Outline tool.

@paaljoachim
Copy link
Contributor

It would be very helpful to get some movement on this PR!
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Block] Table of contents (experimental) Affects the Table of contents Block [Type] Enhancement A suggestion for improvement. [Type] Performance Related to performance efforts
Projects
None yet
5 participants