Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML writer generates duplicate classes on elements & should de-duplicate for clearer HTML #8705

Closed
gwern opened this issue Mar 19, 2023 · 0 comments
Labels

Comments

@gwern
Copy link
Contributor

gwern commented Mar 19, 2023

The Pandoc HTML writer will generate duplicate classes on HTML elements; this may combine with Markdown<->HTML roundtrips or API use to lead to some elements like tables having 5+ redundant classes set on an element.
While legal HTML and not immediately producing any bugs (because classes are idempotent), this is somewhat dangerous/confusing (and slightly inefficient/ugly), and it would be good if Pandoc could just uniquefy classes.


My web guy was disconcerted while debugging another issue to see a bunch of HTML table with many instances of the even/odd class, wondering if there was some bug, eg:

<tr class="even even even even even">
<tr class="odd odd odd odd odd">
<tr class="odd odd odd">

(I had noticed these myself but I think I had written them off as some sort of nesting, where 'even even' != 'even', until he explained that HTML classes, as opposed to CSS rules, didn't work that way: those were all just equivalent to class="even"/"odd" and so why was my HTML weird like that? Was something broken? No, it was fine, just seemed to be Pandoc doing something odd.)

I wondered if some Pandoc process was generating multiple instances of a class, but Pandoc is apparently content to pass through any number of classes:

$ pandoc --version
pandoc 2.18
$ echo '[Foo bar](https://example.com){.class1 .class1}' | pandoc -w html
<p><a href="https://example.com" class="class1 class1">Foo bar</a></p>
$ echo '[Foo bar](https://example.com){.class1 .class1 .class1 .class1 .class1 .class1 .class1}' | pandoc -w html
<p><a href="https://example.com"
class="class1 class1 class1 class1 class1 class1 class1">Foo bar</a></p>

So probably what happened with my 733+ <tr>s with duplicate classes is that as my various documents roundtrip through Pandoc in various ways, whether being generated originally or written & read or munged through the API, the table rows get marked up as even/odd for zebra-striping styling, and each time, the even/odd code simply chucks another class onto the <tr> without checking if it's necessary; and the writer then just passes that through, so the number builds up.
This presumably applies to other elements that Pandoc will write classes on and all other ways classes might get added.


This is a minor style/reliability issue, and not a serious bug: the HTML standards do not forbid duplicate classes, so it's legal/valid, and it will not in and of itself cause any bugs, because browsers will just collapse any number of class1 instances into a single class1 (it's idempotent / treated as a set).

However, I think it is worth fixing: it is widely considered a HTML lint issue and best practices is to eliminate duplicates*; it leads to confusion while debugging; it makes the source less readable if you have to remember that all the varying instances are the same thing; it makes editing harder (more to remove/edit); and it wastes a few bytes.

The cost of a fix is low: while one could add checks to everything that messes with classes, it'd be easier to fix it in the writer at the final stage - it's just the class field of an Attr, so it's presumably localized to a single line; there should be no regressions or API changes involved as all downstream tools should be interpreting duplicate classes as a single class per the standards; performance-wise, most classes are null, so no uniquing is necessary at all; most non-null classes are just 1 class (like the tables in question would just be class="even" or class="odd"), so likewise; and most other classes will be so short that the cost of something like a nub would be unmeasurable (especially if Pandoc is no longer generating duplicate classes).


On a side note, I seem to recall already reporting this issue, but I couldn't refind my report, and I think I might've been confusing it with a different HTML lint issue? If I did already report this exact issue, my apologies.

* To quote an authority no less impeachable than GPT-4 itself on the matter of why to remove duplicate HTML classes: :)

  1. Clean and maintainable code: Having duplicate redundant classes makes the HTML code messy and harder to maintain. Ensuring that there are no duplicate classes improves the readability of the code and makes it easier for developers to understand and update.
  2. Performance: Although the performance impact may be minimal, removing duplicate classes can help reduce the file size of the HTML and CSS code, leading to faster loading times for the website.
  3. Debugging: Duplicate classes can make it difficult to debug and troubleshoot styling issues. Removing duplicates helps to isolate issues and resolve them more quickly.
  4. Best practices: Ensuring that there are no duplicate classes on an HTML element is a best practice in web development, and adhering to best practices results in better quality code and a more professional outcome.
  5. Avoid conflicts: Duplicate classes can potentially cause conflicts when using CSS frameworks or third-party libraries, which might have their own styles applied to the same class names. Removing duplicates helps to avoid these conflicts.
  6. Accessibility: Ensuring a clean and well-structured HTML code, including avoiding duplicate classes
@gwern gwern added the bug label Mar 19, 2023
gwern added a commit to gwern/gwern.net that referenced this issue Mar 20, 2023
@jgm jgm closed this as completed in 87ff4d6 Mar 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant