Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Insecure Transformer used in XmlParsers #1571

Open
qligier opened this issue Mar 7, 2024 · 5 comments · May be fixed by #1717
Open

Insecure Transformer used in XmlParsers #1571

qligier opened this issue Mar 7, 2024 · 5 comments · May be fixed by #1717
Labels

Comments

@qligier
Copy link
Contributor

qligier commented Mar 7, 2024

The Transformer used in XmlParsers (all versions) is insecure to XEE attacks (r5 shown here):

TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer nullTransformer = transformerFactory.newTransformer();

See https://cheatsheetseries.owasp.org/cheatsheets/XML_External_Entity_Prevention_Cheat_Sheet.html#transformerfactory

The current Saxon-HE version (9.8) does not support XMLConstants.ACCESS_EXTERNAL_DTD or XMLConstants.ACCESS_EXTERNAL_SCHEMA (java.lang.IllegalArgumentException: Unknown configuration property http:https://javax.xml.XMLConstants/property/accessExternalDTD).

One way forward is to upgrade Saxon-HE to the latest version (9.8 → 12), which is anyway a good thing to do.

Another way is to use newDefaultInstance() instead of newInstance(), which returns the implementation provided by the JDK (Xalan) instead of Saxon.

Also, the use of a utility class that would configure XML factories would be nice, instead of reconfiguring them at each use (kind of what I did in Husky: https://github.com/project-husky/husky/blob/develop/husky-common/husky-common-gen/src/main/java/org/projecthusky/common/utils/xml/XmlFactories.java).

I can provide a PR if you agree with these solutions.

@grahamegrieve
Copy link
Collaborator

@dotasek I don't think that there's any special functionality here - I don't know why it's even using saxon at this point?

@dotasek
Copy link
Collaborator

dotasek commented Aug 16, 2024

@qligier Could you take a look at #1717 and verify that it is the correct fix?

@dotasek
Copy link
Collaborator

dotasek commented Aug 16, 2024

I am all for removing saxon-he, but as a separate task. I would have to figure out what the impact is on IG publisher, as I think that dependency has caused some issues for users before.

@qligier
Copy link
Contributor Author

qligier commented Aug 16, 2024

@dotasek Looks good to me, thanks!

@dotasek
Copy link
Collaborator

dotasek commented Aug 16, 2024

@qligier @grahamegrieve so there's an issue with using both Saxon HE 12.5, and with using Xalan.

The Problem

Xalan breaks XSLT expectations

I recalled correctly; we briefly defaulted to a non-Saxon implementation when HAPI dropped its own Saxon dependency, and I believe we were using Xalan. The problem was, Xalan was adding namespace prefixes to all XSLT transforms (ns0 below):

<ns0:p xmlns:ns0="http:https://www.w3.org/1999/xhtml">
    This is FHIR Implementation Guide for UNICOM project, created to assist work with pilot product list product data in FHIR.
  </ns0:p>

I could not find a way to turn that behaviour off, so I restored the Saxon dependency.

Saxon 12.x breaks our line reporting

For Saxon HE, we DO in fact rely on implementation details. The following code pulls line and column from the error messages, which are implementation specific:

https://github.com/hapifhir/org.hl7.fhir.core/blob/48741cfd97d0536e6a54ac4e10b066724a805c6b/org.hl7.fhir.r5/src/main/java/org/hl7/fhir/r5/elementmodel/XmlParser.java#L174-178

Without the same error format, we lose line and column reporting.

Unfortunately, these two issues mean we can't jump to 12.5 of Saxon or default to Xalan without fixing either the error messages, or fixing the namespace issue.

The Proposed Compromise

I propose we use 11.6 of Saxon, as I updated in my PR. This change had only one small breaking test because the message format didn't change that much. I propose moving to this version, and then going back and figuring out how to fix our dependence on either implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants