Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incompatibility with possible future regular expression syntax #347

Closed
serhiy-storchaka opened this issue May 12, 2017 · 1 comment
Closed
Milestone

Comments

@serhiy-storchaka
Copy link

Currently the re module supports only simple set syntax. But it is possible that in future it will support extended syntax: nested sets and set operations. Unfortunately that syntax is not fully compatible with the current syntax. In particular open bracket '[' in a character set starts a nested set. The code of html5lib contains a regular expression that will be broken if the new syntax will be accepted.

ascii_punctuation_re = re.compile("[\u0009-\u000D\u0020-\u002F\u003A-\u0040\u005B-\u0060\u007B-\u007E]")

It would be good to guard the code from possible future breakage. It is enough to add a backslash before [. Replace \u005B with \u005C\u005B, \\\u005B or \\[.

See Python issue: https://bugs.python.org/issue30349.

@willkg willkg added this to the 1.0 milestone Nov 29, 2017
@willkg
Copy link
Contributor

willkg commented Nov 29, 2017

Tossing this in the 1.0 milestone. Hopefully someone can get to this in the next couple of days.

@willkg willkg closed this as completed in 3e86e49 Nov 30, 2017
willkg added a commit that referenced this issue Nov 30, 2017
Regexp change for future compatibility. Fixes #347
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants