Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check for mistyped shy group \(:?...\) #6

Closed
wants to merge 1 commit into from

Conversation

basil-conto
Copy link
Contributor

This exhibits a precision of roughly 50% with relint on current emacs-29:
relint-29.txt

So it probably shouldn't be merged, at least not enabled by default.
Any suggestions?
Thanks!

@basil-conto
Copy link
Contributor Author

basil-conto commented Jun 11, 2023

This exhibits a precision of roughly 50% with relint on current emacs-29: relint-29.txt

I intend to submit the following two patches upstream soon.
WDYT?

Against master:

diff --git a/lisp/gnus/gnus-art.el b/lisp/gnus/gnus-art.el
index 6a7a3f41746..eaa39723db0 100644
--- a/lisp/gnus/gnus-art.el
+++ b/lisp/gnus/gnus-art.el
@@ -8331,11 +8331,10 @@ gnus-parse-news-url
       (when (looking-at "\\([A-Za-z]+\\):")
 	(setq scheme (match-string 1))
 	(goto-char (match-end 0)))
-      (when (looking-at "//\\([^:/]+\\)\\(:?\\)\\([0-9]+\\)?/")
+      (when (looking-at "//\\([^/:]+\\):?\\([0-9]+\\)?/")
 	(setq server (match-string 1))
-	(setq port (if (stringp (match-string 3))
-		       (string-to-number (match-string 3))
-		     (match-string 3)))
+        (setq port (and (match-beginning 2)
+                        (string-to-number (match-string 2))))
 	(goto-char (match-end 0)))
 
       (cond
diff --git a/lisp/progmodes/cc-mode.el b/lisp/progmodes/cc-mode.el
index 11a1d3fe6c2..5cf9b7e17f8 100644
--- a/lisp/progmodes/cc-mode.el
+++ b/lisp/progmodes/cc-mode.el
@@ -2859,7 +2859,7 @@ c-or-c++-mode--regexp
                                      "\\|" id "::"
                                      "\\|" id ws-maybe "=\\)"
               "\\|" "\\(?:inline" ws "\\)?namespace"
-                    "\\(:?" ws "\\(?:" id "::\\)*" id "\\)?" ws-maybe "{"
+                    "\\(?:" ws "\\(?:" id "::\\)*" id "\\)?" ws-maybe "{"
               "\\|" "class"     ws id
                     "\\(?:" ws "final" "\\)?" ws-maybe "[:{;\n]"
               "\\|" "struct"     ws id "\\(?:" ws "final" ws-maybe "[:{\n]"

Against emacs-29:

diff --git a/lisp/progmodes/c-ts-mode.el b/lisp/progmodes/c-ts-mode.el
index c6cb9520e58..4775cbd724d 100644
--- a/lisp/progmodes/c-ts-mode.el
+++ b/lisp/progmodes/c-ts-mode.el
@@ -701,7 +701,7 @@ c-ts-mode--font-lock-settings
    `(((call_expression
        (call_expression function: (identifier) @fn)
        @c-ts-mode--fontify-DEFUN)
-      (:match "^DEFUN$" @fn))
+      (:match "\\`DEFUN\\'" @fn))
 
      ((function_definition type: (_) @for-each-tail)
       @c-ts-mode--fontify-for-each-tail
@@ -1319,7 +1319,7 @@ c-ts-mode--c-or-c++-regexp
               "\\|" id "::"
               "\\|" id ws-maybe "=\\)"
               "\\|" "\\(?:inline" ws "\\)?namespace"
-              "\\(:?" ws "\\(?:" id "::\\)*" id "\\)?" ws-maybe "{"
+              "\\(?:" ws "\\(?:" id "::\\)*" id "\\)?" ws-maybe "{"
               "\\|" "class"     ws id
               "\\(?:" ws "final" "\\)?" ws-maybe "[:{;\n]"
               "\\|" "struct"     ws id "\\(?:" ws "final" ws-maybe "[:{\n]"
diff --git a/lisp/progmodes/js.el b/lisp/progmodes/js.el
index 52ed19cc682..48fecf69537 100644
--- a/lisp/progmodes/js.el
+++ b/lisp/progmodes/js.el
@@ -106,7 +106,7 @@ js--opt-cpp-start
 
 (defconst js--plain-method-re
   (concat "^\\s-*?\\(" js--dotted-name-re "\\)\\.prototype"
-          "\\.\\(" js--name-re "\\)\\s-*?=\\s-*?\\(\\(:?async[ \t\n]+\\)function\\)\\_>")
+          "\\.\\(" js--name-re "\\)\\s-*?=\\s-*?\\(\\(?:async[ \t\n]+\\)function\\)\\_>")
   "Regexp matching an explicit JavaScript prototype \"method\" declaration.
 Group 1 is a (possibly-dotted) class name, group 2 is a method name,
 and group 3 is the `function' keyword.")
@@ -3493,7 +3493,7 @@ js--treesit-font-lock-settings
    :language 'javascript
    :feature 'constant
    '(((identifier) @font-lock-constant-face
-      (:match "^[A-Z_][A-Z_\\d]*$" @font-lock-constant-face))
+      (:match "\\`[A-Z_][0-9A-Z_]*\\'" @font-lock-constant-face))
 
      [(true) (false) (null)] @font-lock-constant-face)
 
@@ -3612,7 +3612,7 @@ js--treesit-font-lock-settings
    :feature 'number
    '((number) @font-lock-number-face
      ((identifier) @font-lock-number-face
-      (:match "^\\(:?NaN\\|Infinity\\)$" @font-lock-number-face)))
+      (:match "\\`\\(?:NaN\\|Infinity\\)\\'" @font-lock-number-face)))
 
    :language 'javascript
    :feature 'operator
diff --git a/lisp/progmodes/rust-ts-mode.el b/lisp/progmodes/rust-ts-mode.el
index c3cf8d0cf44..999c1d7ae96 100644
--- a/lisp/progmodes/rust-ts-mode.el
+++ b/lisp/progmodes/rust-ts-mode.el
@@ -143,7 +143,7 @@ rust-ts-mode--font-lock-settings
                               eol))
                       @font-lock-builtin-face)))
      ((identifier) @font-lock-type-face
-      (:match "^\\(:?Err\\|Ok\\|None\\|Some\\)$" @font-lock-type-face)))
+      (:match "\\`\\(?:Err\\|Ok\\|None\\|Some\\)\\'" @font-lock-type-face)))
 
    :language 'rust
    :feature 'comment
@@ -212,11 +212,11 @@ rust-ts-mode--font-lock-settings
      (scoped_use_list path: (scoped_identifier
                              name: (identifier) @font-lock-constant-face))
      ((use_as_clause alias: (identifier) @font-lock-type-face)
-      (:match "^[A-Z]" @font-lock-type-face))
+      (:match "\\`[A-Z]" @font-lock-type-face))
      ((use_as_clause path: (identifier) @font-lock-type-face)
-      (:match "^[A-Z]" @font-lock-type-face))
+      (:match "\\`[A-Z]" @font-lock-type-face))
      ((use_list (identifier) @font-lock-type-face)
-      (:match "^[A-Z]" @font-lock-type-face))
+      (:match "\\`[A-Z]" @font-lock-type-face))
      (use_wildcard [(identifier) @rust-ts-mode--fontify-scope
                     (scoped_identifier
                      name: (identifier) @rust-ts-mode--fontify-scope)])
@@ -232,9 +232,12 @@ rust-ts-mode--font-lock-settings
      (type_identifier) @font-lock-type-face
      ((scoped_identifier name: (identifier) @rust-ts-mode--fontify-tail))
      ((scoped_identifier path: (identifier) @font-lock-type-face)
-      (:match
-       "^\\(u8\\|u16\\|u32\\|u64\\|u128\\|usize\\|i8\\|i16\\|i32\\|i64\\|i128\\|isize\\|char\\|str\\)$"
-       @font-lock-type-face))
+      (:match ,(rx bos
+                   (or "u8" "u16" "u32" "u64" "u128" "usize"
+                       "i8" "i16" "i32" "i64" "i128" "isize"
+                       "char" "str")
+                   eos)
+              @font-lock-type-face))
      ((scoped_identifier path: (identifier) @rust-ts-mode--fontify-scope))
      ((scoped_type_identifier path: (identifier) @rust-ts-mode--fontify-scope))
      (type_identifier) @font-lock-type-face)
@@ -249,7 +252,7 @@ rust-ts-mode--font-lock-settings
    :feature 'constant
    `((boolean_literal) @font-lock-constant-face
      ((identifier) @font-lock-constant-face
-      (:match "^[A-Z][A-Z\\d_]*$" @font-lock-constant-face)))
+      (:match "\\`[A-Z][0-9A-Z_]*\\'" @font-lock-constant-face)))
 
    :language 'rust
    :feature 'variable
diff --git a/lisp/progmodes/typescript-ts-mode.el b/lisp/progmodes/typescript-ts-mode.el
index 1c19a031878..68aefd90f92 100644
--- a/lisp/progmodes/typescript-ts-mode.el
+++ b/lisp/progmodes/typescript-ts-mode.el
@@ -153,7 +153,7 @@ typescript-ts-mode--font-lock-settings
    :language language
    :feature 'constant
    `(((identifier) @font-lock-constant-face
-      (:match "^[A-Z_][A-Z_\\d]*$" @font-lock-constant-face))
+      (:match "\\`[A-Z_][0-9A-Z_]*\\'" @font-lock-constant-face))
      [(true) (false) (null)] @font-lock-constant-face)
 
    :language language
@@ -311,7 +311,7 @@ typescript-ts-mode--font-lock-settings
    :feature 'number
    `((number) @font-lock-number-face
      ((identifier) @font-lock-number-face
-      (:match "^\\(:?NaN\\|Infinity\\)$" @font-lock-number-face)))
+      (:match "\\`\\(?:NaN\\|Infinity\\)\\'" @font-lock-number-face)))
 
    :language language
    :feature 'operator

@basil-conto
Copy link
Contributor Author

basil-conto commented Jun 11, 2023

The vc-git-annotate-time warning probably warrants its own bug report since at first glance it looks like the hour is always zero, and I'm not sure what the intention with that was.

@basil-conto
Copy link
Contributor Author

I intend to submit the following two patches upstream soon.

I've now submitted the more urgent emacs-29 changes at https://bugs.gnu.org/64019.

@mattiase
Copy link
Owner

Detecting \(:?...\) is a lovely idea and suggests that you possess the sort of twisted imagination required for becoming an effective lint author.

As you say, it can probably not be included as-is because of unavoidable false positives. I have myself discarded several useful checks for the same reason. However, if we would come up with a good way to integrate these somewhat low-confidence checks as an option, this one would certainly qualify.

It could take the form of an optional parameter to xr-lint or, perhaps less cleanly, as a dynamic variable. More important is thinking about how it should be used in practice -- someone using the 'noisy' option all the time will be annoyed, while one that keeps it off draws no benefit from it.

@basil-conto
Copy link
Contributor Author

Detecting \(:?...\) is a lovely idea and suggests that you possess the sort of twisted imagination required for becoming an effective lint author.

:)

It could take the form of an optional parameter to xr-lint or, perhaps less cleanly, as a dynamic variable.

I imagine xr-lint could easily take a new optional argument, but relint, with its many entrypoints, would probably heed a dynamic variable? (relint-batch may even be most ergonomic with an environment variable.)

More important is thinking about how it should be used in practice -- someone using the 'noisy' option all the time will be annoyed, while one that keeps it off draws no benefit from it.

Right, I guess it's a balancing act deciding what should be on and what off by default. For shy groups I think it's pretty clear that it should be off by default. A person asking for noisy output is probably prepared to wade through false positives, and either way there's always the relint suppression cookie as an escape hatch.

Regarding what kinds of values this option would take, I'm not sure: is a boolean flag sufficient? If this is found to be insufficient in the future, t can be reinterpreted as 'maximum noisiness', with intermediate (perhaps numeric) values added as needed. I can't imagine offhand the need for so much granularity, though.

WDYT? Did I misinterpret any of your concerns about this?

@basil-conto
Copy link
Contributor Author

BTW, I can't guarantee how soon I'll be able to dedicate time to such an enhancement, so if you'd prefer to close this PR in the meantime, please feel free.

@mattiase
Copy link
Owner

Right, I guess it's a balancing act deciding what should be on and what off by default. For shy groups I think it's pretty clear that it should be off by default. A person asking for noisy output is probably prepared to wade through false positives, and either way there's always the relint suppression cookie as an escape hatch.

That's right. The idea is that relint should be practically free of false positives by default so that it can routinely be run as part of CI or interactively in flycheck/flymake (which also requires it to be fast).

For this purpose I'm using a loose interpretation of 'false positive' as a diagnostic that forces a change to perfectly fine code. Warnings about semantically correct but stylistically questionable (or potentially slow) code are acceptable.

For xr it's indeed just a matter of how to design the API. It's easy to go overboard with separate knobs for confidence and impact, performance and accuracy. (I sound like a strategy consultant, sorry.) Maybe a simple low/high level is good enough. Filtering individual diagnostics is probably better relegated to the caller (ie, relint).

I can't guarantee how soon I'll be able to dedicate time to such an enhancement, so if you'd prefer to close this PR in the meantime, please feel free.

You already provided the fine service of proposing the new check. I'll keep this PR open for now as a reminder to myself.
Other noisy checks might include:

  • non-natural character ranges like [A-z]: ranges where endpoints are from different natural sequences such as 'ASCII digits', 'ASCII lower-case letters', etc.
  • syntactical star height exceeding some constant: detect deep repetition nesting which often results in bad backtracking performance.
  • effective repetition of repetition: repetitions where the body contains one repetition and the rest match the empty string. This is a common way to get exponential-cost backtracking. Example: \(?:a?b+c?\)*
  • incorrect escapes in alternatives, like "[,\\s-]"

dickmao pushed a commit to commercial-emacs/commercial-emacs that referenced this pull request Jun 17, 2023
These issues were caught by modified versions of the GNU ELPA
packages xr and relint:
- mattiase/xr#6
- mattiase/relint#14

* lisp/gnus/gnus-art.el (gnus-parse-news-url): Remove redundant
numbered group and calls to match-string.
* lisp/progmodes/c-ts-mode.el (c-ts-mode--c-or-c++-regexp): Fix shy
group mistyped as optional colon (bug#64019#29).
* lisp/vc/vc-git.el (vc-git-annotate-time): Ditto.  Also fix
timezone parsing by using iso8601-parse (bug#64069).
* test/lisp/vc/vc-git-tests.el (vc-git-test-annotate-time): New
test.
jollaitbot pushed a commit to sailfishos-mirror/emacs that referenced this pull request Jun 17, 2023
The shy groups were caught by modified versions of the GNU ELPA
packages xr and relint:
- mattiase/xr#6
- mattiase/relint#14

* lisp/progmodes/ruby-ts-mode.el (ruby-ts--s-p-query): Quote special
character in regexp.
* lisp/progmodes/java-ts-mode.el (java-ts-mode--font-lock-settings):
* lisp/progmodes/js.el (js--plain-method-re):
(js--treesit-font-lock-settings):
* lisp/progmodes/rust-ts-mode.el (rust-ts-mode--font-lock-settings):
* lisp/progmodes/typescript-ts-mode.el
(typescript-ts-mode--font-lock-settings): Replace character
alternative [\\d], which matches '\' or 'd', with the most likely
intention [0-9].  Fix shy groups mistyped as optional colons.
Remove unneeded numbered :match group in rust-ts-mode (bug#64019).
dickmao pushed a commit to commercial-emacs/commercial-emacs that referenced this pull request Jun 17, 2023
This was originally installed on 2023-06-17 in the emacs-29 release
branch and later reverted.  The intention is to backport it after
Emacs 29.1 is released.

The shy groups were caught by modified versions of the GNU ELPA
packages xr and relint:
- mattiase/xr#6
- mattiase/relint#14

* lisp/progmodes/ruby-ts-mode.el (ruby-ts--s-p-query): Quote special
character in regexp.
* lisp/progmodes/java-ts-mode.el (java-ts-mode--font-lock-settings):
* lisp/progmodes/js.el (js--plain-method-re):
(js--treesit-font-lock-settings):
* lisp/progmodes/rust-ts-mode.el (rust-ts-mode--font-lock-settings):
* lisp/progmodes/typescript-ts-mode.el
(typescript-ts-mode--font-lock-settings): Replace character
alternative [\\d], which matches '\' or 'd', with the most likely
intention [0-9].  Fix shy groups mistyped as optional colons.
Remove unneeded numbered :match group in rust-ts-mode (bug#64019).
mattiase added a commit that referenced this pull request Jul 6, 2023
Only enabled for checks = `all`.
Suggested by Basil Contovounesios.
mattiase added a commit that referenced this pull request Jul 6, 2023
Only enabled for checks = `all`.
Suggested by Basil Contovounesios.
@mattiase
Copy link
Owner

mattiase commented Jul 6, 2023

I added this check to the noisy-checks branch along with some other more or less useful ones.
Still not sure how the new argument will work, or how to interface with relint.

veden pushed a commit to veden/emacs that referenced this pull request Jul 30, 2023
This was originally installed on 2023-06-17 in the emacs-29 release
branch and later reverted.  This backport follows the Emacs 29.1
release (bug#64019).

The shy groups were caught by modified versions of the GNU ELPA
packages xr and relint:
- mattiase/xr#6
- mattiase/relint#14

* lisp/progmodes/ruby-ts-mode.el (ruby-ts--s-p-query): Quote special
character in regexp.
* lisp/progmodes/java-ts-mode.el (java-ts-mode--font-lock-settings):
* lisp/progmodes/js.el (js--plain-method-re):
(js--treesit-font-lock-settings):
* lisp/progmodes/rust-ts-mode.el (rust-ts-mode--font-lock-settings):
* lisp/progmodes/typescript-ts-mode.el
(typescript-ts-mode--font-lock-settings): Replace character
alternative [\\d], which matches '\' or 'd', with the most likely
intention [0-9].  Fix shy groups mistyped as optional colons.
Remove unneeded numbered :match group in rust-ts-mode.

(cherry picked from commit cd8d3f3)
mattiase added a commit that referenced this pull request Aug 1, 2023
Only enabled for checks = `all`.
Suggested by Basil Contovounesios.
@mattiase
Copy link
Owner

mattiase commented Aug 1, 2023

The proposed check and a few others are now on master, gated by a new checks argument to xr-lint.
Currently it is on-or-off (nil or all) but we could name individual checks and allow finer-grained selection if desired. Please let me know.

@mattiase mattiase closed this Aug 1, 2023
@basil-conto
Copy link
Contributor Author

Thanks! I'm AFK for another week at least, but I'm looking forward to taking it for a spin when I'm back, and am confident that the granularity will be fine until a different need arises.

@basil-conto basil-conto deleted the blc/shy branch August 12, 2023 08:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants