Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support \u Unicode escape #702

Closed
roblourens opened this issue Dec 2, 2017 · 1 comment
Closed

Support \u Unicode escape #702

roblourens opened this issue Dec 2, 2017 · 1 comment
Assignees
Labels
enhancement An enhancement to the functionality of the software.

Comments

@roblourens
Copy link
Contributor

Ripgrep supports \x{1234}, but not \u1234, while many people expect the latter. Have you considered also supporting \u?

If not, we'll rewrite one to the other for compatibility with the JS regex engine.

Ref: microsoft/vscode#39404

@BurntSushi
Copy link
Owner

This is an interesting ticket. It probably belongs as an issue on the regex engine itself. Historically, I chose not to support the \u syntax because Rust itself has support for such syntax in its string literals. But obviously that isn't available when you aren't writing Rust code. That seems like a good enough argument to support the syntax to me.

Note that technically, according to UAX#18, the \x syntax satisfies the requirement, but they also list a specific grammar for the \u (and \u{...}) style of syntax.

I filed a ticket with the regex crate, but I'll leave this open so it's easy to track with respect to ripgrep.

(I don't have a specific timeline on this. I should be able to get it done as part of the regex-syntax rewrite that I'm working on, but I don't have an ETA. It could be a few weeks (ideal) if things go well, or it could be months.)

@BurntSushi BurntSushi added the enhancement An enhancement to the functionality of the software. label Dec 2, 2017
@BurntSushi BurntSushi self-assigned this Dec 2, 2017
BurntSushi added a commit that referenced this issue Mar 14, 2018
This update brings with it many bug fixes:

  * Better error messages are printed overall. We also include
    explicit call out for unsupported features like backreferences
    and look-around.
  * Regexes like `\s*{` no longer emit incomprehensible errors.
  * Unicode escape sequences, such as `\u{..}` are now supported.

For the most part, this upgrade was done in a straight-forward way. We
resist the urge to refactor the `grep` crate, in anticipation of it
being rewritten anyway.

Note that we removed the `--fixed-strings` suggestion whenever a regex
syntax error occurs. In practice, I've found that it results in a lot of
false positives, and I believe that its use is not as paramount now that
regex parse errors are much more readable.

Closes #268, Closes #395, Closes #702, Closes #853
BurntSushi added a commit that referenced this issue Mar 14, 2018
This update brings with it many bug fixes:

  * Better error messages are printed overall. We also include
    explicit call out for unsupported features like backreferences
    and look-around.
  * Regexes like `\s*{` no longer emit incomprehensible errors.
  * Unicode escape sequences, such as `\u{..}` are now supported.

For the most part, this upgrade was done in a straight-forward way. We
resist the urge to refactor the `grep` crate, in anticipation of it
being rewritten anyway.

Note that we removed the `--fixed-strings` suggestion whenever a regex
syntax error occurs. In practice, I've found that it results in a lot of
false positives, and I believe that its use is not as paramount now that
regex parse errors are much more readable.

Closes #268, Closes #395, Closes #702, Closes #853
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement An enhancement to the functionality of the software.
Projects
None yet
Development

No branches or pull requests

2 participants