Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleaner output is nested incorrectly if safelist and input document have differing cases #2049

Closed
Balf opened this issue Nov 13, 2023 · 4 comments
Assignees
Labels
bug Confirmed bug that we should fix fixed
Milestone

Comments

@Balf
Copy link

Balf commented Nov 13, 2023

@jhy We're running into an issue which might be related to #2040. In version 1.16.1 we used to code snippet below to clean SVG's and it worked fine. In 1.16.2 the same code breaks, because the cleaned SVG is invalid. Self closing elements are not parsed properly. How should we do this in 1.16.2?

String svg = "...see below for the source svg";
Safelist safeList = Safelist.none();

String[] lowercaseElementsArray = Arrays.stream(DEFAULT_SVG_ELEMENTS)
            .map(String::toLowerCase)
            .toList()
            .toArray(new String[0]);

String[] lowercaseAttributesArray = Arrays.stream(DEFAULT_SVG_ATTRIBUTES)
            .map(String::toLowerCase)
            .toList()
            .toArray(new String[0]);

safelist.addTags(lowercaseElementsArray);
safelist.addAttributes(":all", lowercaseAttributesArray);

String cleanedSvg = Jsoup.clean(svg, safeList)

SVG Source

<svg width="200" height="200" xmlns="https://www.w3.org/2000/svg" xmlns:xlink="https://www.w3.org/1999/xlink">
    <filter id="feOffset" x="-40" y="-20" width="100" height="200">
        <feOffset in="SourceGraphic" dx="60" dy="60" />
        <feGaussianBlur in="SourceGraphic" stdDeviation="5" result="blur2" />
        <feMerge>
            <feMergeNode
                    in="blur2"></feMergeNode>
            <feMergeNode></feMergeNode>
            <feMergeNode in="SourceGraphic" />
        </feMerge>
    </filter>
    <filter id="noise2" x="0" y="0" width="100%" height="100%">
        <feTurbulence baseFrequency="0.05" />
    </filter>
    <clipPath id="myClip2"          clipPathUnits="objectBoundingBox">
        <circle cx=".5" cy=".5" r=".35" />
    </clipPath>
    <filter id="convolveMatrix2" x="0" y="0" width="100%" height="100%">
        <feConvolveMatrix
                kernelMatrix="-1 0 0 0 0 0 0 0 1" />
    </filter>
    <rect
            x="40"
            y="40"
            width="100"
            height="100"
            style="stroke: #000000; fill: green; filter: url(#feOffset);" />
    <rect
            x="40"
            y="40"
            width="100"
            height="100"
            style="stroke: #000000; fill: green;" />
</svg> ```

Expected result

<svg width="200" height="200" xmlns="https://www.w3.org/2000/svg" xmlns:xlink="https://www.w3.org/1999/xlink">
 <filter id="feOffset" x="-40" y="-20" width="100" height="200">
  <feOffset in="SourceGraphic" dx="60" dy="60"></feOffset>
  <feGaussianBlur in="SourceGraphic" stdDeviation="5" result="blur2"></feGaussianBlur>
  <feMerge>
   <feMergeNode in="blur2"></feMergeNode>
   <feMergeNode></feMergeNode>
   <feMergeNode in="SourceGraphic"></feMergeNode>
  </feMerge>
 </filter> <filter id="noise2" x="0" y="0" width="100%" height="100%">
  <feTurbulence baseFrequency="0.05"></feTurbulence>
 </filter> <clipPath id="myClip2" clipPathUnits="objectBoundingBox">
  <circle cx=".5" cy=".5" r=".35"></circle>
 </clipPath> <filter id="convolveMatrix2" x="0" y="0" width="100%" height="100%">
  <feConvolveMatrix kernelMatrix="-1 0 0 0 0 0 0 0 1"></feConvolveMatrix>
 </filter> <rect x="40" y="40" width="100" height="100" style="stroke: #000000; fill: green; filter: url(#feOffset);"></rect> <rect x="40" y="40" width="100" height="100" style="stroke: #000000; fill: green;"></rect>
</svg>

Actual result

<svg width="200" height="200" xmlns="https://www.w3.org/2000/svg" xmlns:xlink="https://www.w3.org/1999/xlink">
 <filter id="feOffset" x="-40" y="-20" width="100" height="200">
  <feOffset in="SourceGraphic" dx="60" dy="60">
   <feGaussianBlur in="SourceGraphic" result="blur2">
    <feMerge>
     <feMergeNode in="blur2">
      <feMergeNode>
       <feMergeNode in="SourceGraphic">
       </feMergeNode>
       <filter id="noise2" x="0" y="0" width="100%" height="100%">
        <feTurbulence>
        </feTurbulence>
        <clipPath id="myClip2">
         <circle cx=".5" cy=".5" r=".35"></circle>
         <filter id="convolveMatrix2" x="0" y="0" width="100%" height="100%">
          <feConvolveMatrix>
          </feConvolveMatrix>
          <rect x="40" y="40" width="100" height="100" style="stroke: #000000; fill: green; filter: url(#feOffset);"></rect>
          <rect x="40" y="40" width="100" height="100" style="stroke: #000000; fill: green;"></rect>
         </filter>
        </clipPath>
       </filter>
      </feMergeNode>
     </feMergeNode>
    </feMerge>
   </feGaussianBlur>
  </feOffset>
 </filter>
</svg>
@Balf Balf changed the title Self-closing tags in SVG's are broken when cleaning the SVG Self-closing tags in SVG's are broken when cleaning SVG's Nov 13, 2023
@jhy
Copy link
Owner

jhy commented Nov 16, 2023

Hi @Balf - can you also provide the details for // The default svg tags are added to the safe list? I want to check that I'm looking down the right path here.

@Balf
Copy link
Author

Balf commented Nov 17, 2023

(JH: Snipped inline code; moved to this gist)

@jhy jhy changed the title Self-closing tags in SVG's are broken when cleaning SVG's Self-closing tags in SVGs are broken when cleaning SVGs Nov 22, 2023
@jhy jhy self-assigned this Nov 22, 2023
@jhy jhy added the bug Confirmed bug that we should fix label Nov 22, 2023
@jhy jhy changed the title Self-closing tags in SVGs are broken when cleaning SVGs Cleaner output is nested incorrectly if safelist and input document have differing cases Nov 22, 2023
@jhy jhy closed this as completed in 94af4ec Nov 22, 2023
@jhy jhy added the fixed label Nov 22, 2023
@jhy jhy added this to the 1.17.1 milestone Nov 22, 2023
jhy added a commit that referenced this issue Nov 23, 2023
Allows preserved case attributes in input to make it into the output, even if the safelist attribute name was set as lowercase).

Fixes attribute case in #2049
@jhy
Copy link
Owner

jhy commented Nov 23, 2023

Thanks for reporting this, fixed!

The issue wasn't about self-closing tags so much as a problem with case normalization. The tail hit (here - line 168) was testing against the nodename not the normalname. The head was already using the normalname. This difference had no impact in earlier versions of jsoup because through this cleaner path, the parsed tags would all be lowercased. But now with SVG foreign elements preserving case, the bug manifested.

Fixed by using the normalname, and also ensuring that the input to the safelist is normalized.

Similarly for attribute names - the input names were not normalized and so by adding lower-cased versions of the attributes, they were removed from the output.

I also noticed an issue that the self-closing style of the input for tags was not preserved in the output - have fixed that too.

The actual output is now:

<svg width="200" height="200" xmlns="https://www.w3.org/2000/svg" xmlns:xlink="https://www.w3.org/1999/xlink">
 <filter id="feOffset" x="-40" y="-20" width="100" height="200">
  <feOffset in="SourceGraphic" dx="60" dy="60" />
  <feGaussianBlur in="SourceGraphic" stdDeviation="5" result="blur2" />
  <feMerge>
   <feMergeNode in="blur2" />
   <feMergeNode />
   <feMergeNode in="SourceGraphic" />
  </feMerge>
 </filter> <filter id="noise2" x="0" y="0" width="100%" height="100%">
  <feTurbulence baseFrequency="0.05" />
 </filter> <clipPath id="myClip2" clipPathUnits="objectBoundingBox">
  <circle cx=".5" cy=".5" r=".35" />
 </clipPath> <filter id="convolveMatrix2" x="0" y="0" width="100%" height="100%">
  <feConvolveMatrix kernelMatrix="-1 0 0 0 0 0 0 0 1" />
 </filter> <rect x="40" y="40" width="100" height="100" style="stroke: #000000; fill: green; filter: url(#feOffset);" /> <rect x="40" y="40" width="100" height="100" style="stroke: #000000; fill: green;" />
</svg>

Which renders correctly.

As a feature thought - I wonder if it would be useful to include an enforceCase option in the cleaner, which would enforce the case of tags and elements as they are set in the safelist? That way one could normalize to a specific case. Possibly also useful for XML cleaning (though we need to update the Cleaner to support full documents, not just the body fragment, for that to be more useful).

@Balf
Copy link
Author

Balf commented Nov 27, 2023

@jhy Thank you! It might be nice to have such an option, it allows a bit more flexibility

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Confirmed bug that we should fix fixed
Projects
None yet
Development

No branches or pull requests

2 participants