Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tag Processor: Document and test XSS prevention in set_attribute #44447

Merged
merged 10 commits into from
Sep 29, 2022

Conversation

adamziel
Copy link
Contributor

Part of #44410

What?

The WP_HTML_Tag_Processor allows setting new HTML attributes with a given name and value. Passing malicious values such as " onclick="alert(1) should not create new attributes. WP_HTML_Tag_Processor prevents that by applying the correct escaping.

In this patch we're adding a series of tests and comments to document this behavior and prevent any future regressions.

Why?

We don't want to allow setting attributes whose names could break the HTML syntax or open new vulnerabilities in code that parses post content.

Testing Instructions

Validate the given escaping and tests are sufficient to prevent XSS via injecting a malicious attribute value.

cc @azaozz @dmsnell @gziolo @aristath

@adamziel adamziel self-assigned this Sep 26, 2022
@adamziel adamziel added [Type] Enhancement A suggestion for improvement. [Feature] Block API API that allows to express the block paradigm. Developer Experience Ideas about improving block and theme developer experience labels Sep 26, 2022
@adamziel adamziel changed the title Document and test encoding double quotes Tag Processor: Document and test encoding double quotes Sep 26, 2022
@adamziel adamziel changed the title Tag Processor: Document and test encoding double quotes Tag Processor: Document and test XSS prevention in set_attribute Sep 26, 2022
Copy link
Member

@gziolo gziolo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know, it correctly documents the current behavior and it’s what been discussed as the way moving forward. I would appreciate confirmation from @dmsnell.

* This is sufficient because:
* * The attributes touched by set_attribute are always rewritten using double quotes.
* * The only way to terminate a double quoted attribute value is via a double quote (")
* character. There is no need to encode other characters such as <, >, /, or '.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the existing body of code attempting to parse HTML in WordPress and its plugins, I think @azaozz has a good point that we might want to encode <, >, and & as well. This isn't an HTML thing, but if we allow alt="the <img /> tag we might open the door for defects in other code to trip up here when we had the power to prevent it.

the one caveat I see is with class because we don't support decoding character references at this moment, but even then I think it's fine to encode them on save.

Copy link
Contributor Author

@adamziel adamziel Sep 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, let’s escape these as well. I agree it’s fine to encode the class names on save. One caveat is stringifying one processor and initialising another with the result - we might need to add a ”slow mode” that decodes attributes values by default to support that.

Copy link
Contributor Author

@adamziel adamziel Sep 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, let's stick to encoding just ", <, and >. Encoding & breaks the following use-case:

$p->set_attribute( 'data-price', '10 &euro;' );
// <div data-price="10 &amp;euro;"></div>

I just committed that in 55a0a83;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that _wp_specialchars() exists to handle encoding without double-encoding (unless you explicitly tell it to). it looks up any character references before replacing & willy-nilly.

"10 &euro; &amp; &quot;USD&quot;" === _wp_specialchars( '10 &euro; & "USD"' );

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_wp_specialchars seems to be deprecated in favor of esc_html, which does the same thing as esc_attr – let's try esc_attr then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposed in 6c309a8

@adamziel adamziel force-pushed the update/tag-processor-escape-attribute-values branch from 9598b78 to 6c309a8 Compare September 28, 2022 05:38
Copy link
Member

@dmsnell dmsnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry to drag this on, but I feel like we need to.

with the proposed use of esc_attr() we're preventing the use of this processor for URL fields, which you might think should be escaped with esc_url() but actually needs to be handled more carefully.

this brings me back to why we left this idea of general escaping unaddressed in this system because escaping is dependent on which attribute is being set.

we could start to build in that handling and distinguish which attribute is being set, or we could bail and leave it up to developers using this API to do their own escaping.

potentially we could introduce something in Core like esc_attr( $text, $attribute_name = null ) to handle the distinctions.

Type Value
URL https://wordpress.com/?fields=id,user&pretty=true&q="is anything > 0"
Wanted escaping https://wordpress.com/?fields=id,user&pretty=true&q="is%20anything%20>%200"
esc_attr() https://wordpress.com/?fields=id,user&amp;pretty=true&amp;q=&quot;is anything &gt; 0&quot;
esc_url() https://wordpress.com/?fields=id,user&#038;pretty=true&#038;q=is%20anything%20%200
esc_url($context = 'attribute') https://wordpress.com/?fields=id,user&pretty=true&q=is%20anything%20%200

(the $_context parameter to esc_url() is marked private and distinguishes between sanitizing the URL for database storage and for display)

these are not complicated URLs. for example, if I use an image from my blog on WordPress.com both esc_attr() and esc_url() break the link, ironically eliminating the ssl=1 query arg. these URLs are generated from WordPress itself and are exactly the kinds of URLs I expect code to inject when using this processor.

Type Value
URL https://i0.wp.com/fluffyandflakey.blog/wp-content/uploads/2019/09/Untitled_Artwork.png?fit=1304%2C497&ssl=1
esc_attr() https://i0.wp.com/fluffyandflakey.blog/wp-content/uploads/2019/09/Untitled_Artwork.png?fit=1304%2C497&amp;ssl=1
esc_url() https://i0.wp.com/fluffyandflakey.blog/wp-content/uploads/2019/09/Untitled_Artwork.png?fit=1304%2C497&#038;ssl=1

one option we have is to bail sanitization if setting a src, href, srcset, or other URL fields. I'm leery of that because of how different the protections are for those fields vs. for others. I'd rather we try to do everything or nothing so that the expectations are clear.

@dmsnell dmsnell self-requested a review September 28, 2022 19:01
@dmsnell
Copy link
Member

dmsnell commented Sep 28, 2022

:sigh: I'm getting confused myself again with HTML escaping vs. parsed values.

looks like esc_attr() is correct here because even though it escapes more than we want, the browser still parses the attribute value because HTML5 specifies that all attribute values may contain character references

Screen Shot 2022-09-28 at 12 09 32 PM

@dmsnell
Copy link
Member

dmsnell commented Sep 28, 2022

do you think this should be paired with get_attribute() returning the decoded values?

@dmsnell
Copy link
Member

dmsnell commented Sep 28, 2022

do you think this should be paired with get_attribute() returning the decoded values?

I've added this in 907d37d

@adamziel
Copy link
Contributor Author

adamziel commented Sep 29, 2022

I've added this in 907d37d

Yeah it only makes sense to do it. It shouldn't even add much of a speed penalty since it's not a part of the parser but of the getter. You only bear the cost when you specifically want to read that value. If that ever becomes a problem we can add some configuration to disable the decoding when needed, but I don't think we'll ever get there.

@adamziel adamziel merged commit b58ff34 into trunk Sep 29, 2022
@adamziel adamziel deleted the update/tag-processor-escape-attribute-values branch September 29, 2022 02:42
@adamziel
Copy link
Contributor Author

@dmsnell I went ahead and merged this since the URL encoding is a non-issue and my only change on top of yours was another test file. I think we're good here – let's iterate in another PR if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Developer Experience Ideas about improving block and theme developer experience [Feature] Block API API that allows to express the block paradigm. [Type] Enhancement A suggestion for improvement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants