-
-
Notifications
You must be signed in to change notification settings - Fork 66
Add a new renderer class with XSS protections #60
Conversation
We need an escape function to implement XSS protection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your pull request!
:arg img_src_rewrite: the URL of an image proxy, necessary to rewrite the | ||
``src`` attributes of images | ||
|
||
Both srings should include a ``{link}`` placeholder for the URL-encoded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
srings should be strings
|
||
def __init__(self, flags=(), sanitization_mode='skip-html', nesting_level=0, | ||
link_rewrite=None, img_src_rewrite=None): | ||
if not isinstance(flags, tuple): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This breaks compatibility with HtmlRenderer
, which accepts flags as a tuple of strings and ORed constants. It's not a big issue, but it won't be a drop-in for HtmlRenderer
.
I would probably be better to have the flags be one type, but it'll be annoying for people if their code breaks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't add support for passing integers because I saw that it's deprecated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops. My mistake.
|
||
def test_html_escape(self): | ||
supplied = 'Example <script>alert(1);</script>' | ||
expected = '<p>%s</p>\n' % escape_html(supplied) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer str.format. Is there a reason you used old-style formatting?
else: | ||
return escape_html("[%s](%s)" % (content, raw_link)) | ||
|
||
def check_link(self, link, is_image_src=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_image_src
is not used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not put self._allowed_url_re.match(link)
directly in the render functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The separate check_link
method is to allow customization of link filtering by subclasses.
You're right that is_image_src
isn't used, that's a copy-paste error on my part.
link = self.rewrite_link(raw_link) | ||
maybe_title = ' title="%s"' % escape_html(title) if title else '' | ||
link = escape_html(link) | ||
return ('<a href="%s"%s>' + content + '</a>') % (link, maybe_title) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer str.format. Is there a reason you used old-style formatting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use percent formatting by default, mostly because it's shorter and easier to type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. That's cool.
for url in ('http:https://a', 'https://b'): | ||
actual = render_rewrite("['foo](%s \"bar'\")" % url) | ||
expected = '<p><a href="%s" title="bar'">'foo</a></p>\n' | ||
ok(actual).diff(expected % rewrite_link(url)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would put % rewrite_link(url)
on the above line.
""" | ||
return bool(self._allowed_url_re.match(link)) | ||
|
||
def rewrite_link(self, link, is_image_src=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rewrite_url
? As it's both used for links and images.
ok(actual).diff(expected) | ||
|
||
def test_autolink_rewriting(self): | ||
for url in ('http:https://a', "https://b?x&y"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use single quotes. ;D
""" | ||
Filters links. | ||
""" | ||
if self.check_link(raw_link): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I enable the autolink extension email addresses are not rendered into links anymore.
My testing code:
from misaka import escape_html, Markdown, SaferHtmlRenderer, HtmlRenderer
render_normal = Markdown(HtmlRenderer(), extensions=('autolink',))
render = Markdown(SaferHtmlRenderer())
render_escape = Markdown(SaferHtmlRenderer(sanitization_mode='escape'))
renderer_rewrite = SaferHtmlRenderer(
link_rewrite='https://example.com/redirect/{link}',
img_src_rewrite='https://img_proxy/{link}',
)
render_rewrite = Markdown(renderer_rewrite, extensions=('autolink',),)
rewrite_link = renderer_rewrite.rewrite_link
print(render_normal('<https://b?x&y>'))
print(render_normal('Hello https://b?x&y Hola'))
print(render_normal('Hello https://example.com Hola'))
print(render_normal('Banana [email protected]'))
print(render_normal('Banana <[email protected]>'))
print('Safe:\n')
print(render_rewrite('<https://b?x&y>'))
print(render_rewrite('Hello https://b?x&y Hola'))
print(render_rewrite('Hello https://example.com Hola'))
print(render_rewrite('Banana [email protected]'))
print(render_rewrite('Banana <[email protected]>'))
Output:
<p><a href="https://b?x&y">https://b?x&y</a></p>
<p>Hello https://b?x&y Hola</p>
<p>Hello <a href="https://example.com">https://example.com</a> Hola</p>
<p>Banana <a href="mailto:[email protected]">[email protected]</a></p>
<p>Banana <a href="mailto:[email protected]">[email protected]</a></p>
Safe:
<p><a href="https://example.com/redirect/https%3A//b%3Fx%26y">https://b?x&y</a></p>
<p>Hello https://b?x&y Hola</p>
<p>Hello <a href="https://example.com/redirect/https%3A//example.com">https://example.com</a> Hola</p>
<p>Banana <[email protected]></p>
<p>Banana <[email protected]></p>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not exactly a bug, only HTTP and HTTPS links are allowed by default, so mailto:
is rejected. Do you think mailto:
should be allowed by default, or that the documentation should be modified to clarify that the new renderer rejects email addresses?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latter is a good choice I think.
@FSX I've tried to address all your comments. Let me know if you'd like me to squash the commits to keep the Git history clean. |
The python 2.6 build has errored on Travis. @FSX Have you considered running a single build for all python versions? This project already has Tox set up so it'd be very easy to change from multiple builds to a single one. |
Looks good. I do like the separate builds of Travis. When something errors out I just restart the build. |
Merging this now. Thanks! I'll do a release when I get home tonight. |
This Pull Request implements #59. Let me know if anything doesn't look right.