Skip to content
This repository has been archived by the owner on Sep 24, 2022. It is now read-only.

Add a new renderer class with XSS protections #60

Merged
merged 10 commits into from
Jan 15, 2017

Conversation

Changaco
Copy link
Contributor

This Pull Request implements #59. Let me know if anything doesn't look right.

Copy link
Owner

@FSX FSX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your pull request!

:arg img_src_rewrite: the URL of an image proxy, necessary to rewrite the
``src`` attributes of images

Both srings should include a ``{link}`` placeholder for the URL-encoded
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

srings should be strings


def __init__(self, flags=(), sanitization_mode='skip-html', nesting_level=0,
link_rewrite=None, img_src_rewrite=None):
if not isinstance(flags, tuple):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks compatibility with HtmlRenderer, which accepts flags as a tuple of strings and ORed constants. It's not a big issue, but it won't be a drop-in for HtmlRenderer.

I would probably be better to have the flags be one type, but it'll be annoying for people if their code breaks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't add support for passing integers because I saw that it's deprecated.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. My mistake.


def test_html_escape(self):
supplied = 'Example <script>alert(1);</script>'
expected = '<p>%s</p>\n' % escape_html(supplied)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer str.format. Is there a reason you used old-style formatting?

else:
return escape_html("[%s](%s)" % (content, raw_link))

def check_link(self, link, is_image_src=False):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_image_src is not used.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not put self._allowed_url_re.match(link) directly in the render functions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The separate check_link method is to allow customization of link filtering by subclasses.

You're right that is_image_src isn't used, that's a copy-paste error on my part.

link = self.rewrite_link(raw_link)
maybe_title = ' title="%s"' % escape_html(title) if title else ''
link = escape_html(link)
return ('<a href="%s"%s>' + content + '</a>') % (link, maybe_title)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer str.format. Is there a reason you used old-style formatting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use percent formatting by default, mostly because it's shorter and easier to type.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. That's cool.

for url in ('http:https://a', 'https://b'):
actual = render_rewrite("['foo](%s \"bar'\")" % url)
expected = '<p><a href="%s" title="bar&#39;">&#39;foo</a></p>\n'
ok(actual).diff(expected % rewrite_link(url))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put % rewrite_link(url) on the above line.

"""
return bool(self._allowed_url_re.match(link))

def rewrite_link(self, link, is_image_src=False):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rewrite_url? As it's both used for links and images.

ok(actual).diff(expected)

def test_autolink_rewriting(self):
for url in ('http:https://a', "https://b?x&y"):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use single quotes. ;D

"""
Filters links.
"""
if self.check_link(raw_link):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I enable the autolink extension email addresses are not rendered into links anymore.

My testing code:

from misaka import escape_html, Markdown, SaferHtmlRenderer, HtmlRenderer

render_normal = Markdown(HtmlRenderer(), extensions=('autolink',))
render = Markdown(SaferHtmlRenderer())
render_escape = Markdown(SaferHtmlRenderer(sanitization_mode='escape'))
renderer_rewrite = SaferHtmlRenderer(
    link_rewrite='https://example.com/redirect/{link}',
    img_src_rewrite='https://img_proxy/{link}',
)
render_rewrite = Markdown(renderer_rewrite, extensions=('autolink',),)
rewrite_link = renderer_rewrite.rewrite_link


print(render_normal('<https://b?x&y>'))
print(render_normal('Hello https://b?x&y Hola'))
print(render_normal('Hello https://example.com Hola'))
print(render_normal('Banana [email protected]'))
print(render_normal('Banana <[email protected]>'))

print('Safe:\n')

print(render_rewrite('<https://b?x&y>'))
print(render_rewrite('Hello https://b?x&y Hola'))
print(render_rewrite('Hello https://example.com Hola'))
print(render_rewrite('Banana [email protected]'))
print(render_rewrite('Banana <[email protected]>'))

Output:

<p><a href="https://b?x&amp;y">https://b?x&amp;y</a></p>

<p>Hello https://b?x&amp;y Hola</p>

<p>Hello <a href="https://example.com">https://example.com</a> Hola</p>

<p>Banana <a href="mailto:[email protected]">[email protected]</a></p>

<p>Banana <a href="mailto:[email protected]">[email protected]</a></p>

Safe:

<p><a href="https://example.com/redirect/https%3A//b%3Fx%26y">https://b?x&amp;y</a></p>

<p>Hello https://b?x&amp;y Hola</p>

<p>Hello <a href="https://example.com/redirect/https%3A//example.com">https://example.com</a> Hola</p>

<p>Banana &lt;[email protected]&gt;</p>

<p>Banana &lt;[email protected]&gt;</p>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not exactly a bug, only HTTP and HTTPS links are allowed by default, so mailto: is rejected. Do you think mailto: should be allowed by default, or that the documentation should be modified to clarify that the new renderer rejects email addresses?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latter is a good choice I think.

@Changaco
Copy link
Contributor Author

@FSX I've tried to address all your comments. Let me know if you'd like me to squash the commits to keep the Git history clean.

@Changaco
Copy link
Contributor Author

The python 2.6 build has errored on Travis.

@FSX Have you considered running a single build for all python versions? This project already has Tox set up so it'd be very easy to change from multiple builds to a single one.

@FSX
Copy link
Owner

FSX commented Jan 15, 2017

Looks good.

I do like the separate builds of Travis. When something errors out I just restart the build.

@FSX
Copy link
Owner

FSX commented Jan 15, 2017

Merging this now. Thanks!

I'll do a release when I get home tonight.

@FSX FSX merged commit bc251d4 into FSX:master Jan 15, 2017
@Changaco Changaco deleted the xss-protection branch January 15, 2017 13:49
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants