Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infinite loop with nested button #4

Closed
gsnedders opened this issue Apr 9, 2013 · 1 comment
Closed

Infinite loop with nested button #4

gsnedders opened this issue Apr 9, 2013 · 1 comment
Labels
Milestone

Comments

@gsnedders
Copy link
Member

https://code.google.com/p/html5lib/issues/detail?id=211

Reported by [email protected], Aug 24, 2012

So I know this is not well-formed HTML, but it occurred in the wild as the output from Markdown.

I have the latest pypi Python library (version = 0.95-dev).

If I try to parse the following HTML, my program goes into an infinite loop and memory usage increases without stop:

u"<p>So theres no shortage of info out there on rounded corners and I've been through much of it and I'm posting to get the communities opinons at this piont.</p>\n<p>My scenario is that we're developing a rounded corner dependant design, mainly used for interactions (<button> and <a>). We are going to use border radius for the good browsers on the block that play nice with it and then use the server to send down javscript to browsers that don't</p>\n<p>What I'm wondering is what to use to up scale the browsers that ignore border radius CSS? I need something that works on button aswell as a, div etc. I've been looking at the following and have found that some don't play nice with <button>. Also the site already uses jQuery.</p>\n<p>https://www.curvycorners.net/ - https://code.google.com/p/jquerycurvycorners/</p>\n<p>https://www.html.it/articoli/niftycube/index.html</p>\n<p>https://www.malsup.com/jquery/corner/</p>"

Aug 24, 2012 waylan

I can't comment on the infinite loop, but as the maintainer of the Markdown library, I was concerned regarding the original reporter's implication that Markdown may be producing invalid HTML. While only the output is provided, not the input, it appears to me that the invalid output is a result of invalid input. You should be wrapping those random angle-bracket tags in code tags. So "(<button> and <a>)" (note the backticks surrounding each tag) would be output by Markdown as "(<button> and <a>)", which is valid HTML and will not result in an infinite loop in html5lib.

If, in the event that the Markdown input is coming from an untrusted third party, then you absolutely should be sanitizing it before passing it on to anything else.

That said, one such way to sanitize (my recommendation) is to use the Bleach library 1, which uses html5lib internally. So I guess we're back to that infinite loop.

Aug 24, 2012 [email protected]

The Markdown comes from the wild and is probably invalid.

My idea was to pass the HTML through tidy before running an HTML parser, thus avoiding an infinite loop. There are several tidy wrappers in Python. I used pytidylib.

I didn't play with the options to make tidy more strict, and even after tidy, html5lib still goes into an infinite loop. So my current workaround is to use tidy followed by lxml :\

@gsnedders
Copy link
Member Author

html5lib.parse(u"<button><p><button>") is enough to trigger this.

gsnedders added a commit to gsnedders/html5lib-python that referenced this issue May 1, 2013
… loop.

A couple of elements (button, dialog) were missing from the list of
endTagBlock in-body-phase dispatcher. This adds them.

See html5lib/html5lib-tests#4 for test.
gsnedders added a commit to gsnedders/html5lib-tests that referenced this issue Dec 23, 2013
hugovk added a commit to hugovk/html5lib-python that referenced this issue Feb 25, 2020
Add testing and document support for Python 3.7, 3.8 & PyPy3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant