RB: add query detecting validators that use badly anchored regular expressions on library/remote input #11824

erik-krogh · 2023-01-05T15:20:26Z

CVE-2022-31163: TP

I tried to see if flagging all ^...$ regular expressions could work, but I found that to be way too noisy.
So I've restricted the query to look for sanitizer-like (raises an exception) uses of the badly anchored regular expressions, where I can find flow from library-input / remote-flow.

MRVA run on 1000 projects shows a bunch of new results.
The results seem OK, but they're not something that I think requires immediate action.

Evaluation looks OK. Some new results, which seems fine.

github-actions · 2023-01-05T15:21:26Z

QHelp previews:

ruby/ql/src/queries/security/cwe-020/MissingFullAnchor.qhelp

Badly anchored regular expression

Regular expressions in Ruby can use anchors to match the beginning and end of a string. However, if the ^ and $ anchors are used, the regular expression can match a single line of a multi-line string.

Recommendation

Use the \A and \z anchors to match the beginning and end of a string, as these will always match the beginning and end of the string, even if the string contains newlines.

Example

The following example code uses a regular expression to check that a string contains only digits.

def bad(input) 
    raise "Bad input" unless input =~ /^[0-9]+$/

    # ....
end

The regular expression /^[0-9]+$/ will match a single line of a multi-line string, which may not be the intended behavior. To match the entire string, the regular expression should be \A[0-9]+\z.

def good(input)
    raise "Bad input" unless input =~ /\A[0-9]+\z/

    # ....
end

References

RDoc Documentation: Anchors
Common Weakness Enumeration: CWE-20.

ruby/ql/lib/codeql/ruby/security/regexp/MissingFullAnchorCustomizations.qll

hmac · 2023-01-10T21:49:18Z

ruby/ql/lib/codeql/ruby/Concepts.qll

+ * Extend this class to model new APIs. If you want to refine existing API models,
+ * extend `RegexExecution` instead.
+ */
+ abstract class Range extends DataFlow::Node {


This is almost identical to the Python concept except that theirs has string getName() and is missing RegExpTerm getTerm(). Can we make them the same, so we can share these concepts (maybe not now, but in the future)?

I think Python could use RegExpTerm getTerm() (as a followup after I merge the PR that introduces tracking of string values to regex executions: #11833).

Their use of getName() is for py/regex-injection, which we don't have yet (but we could).
I'll look into that.

hmac · 2023-01-10T21:52:04Z

ruby/ql/lib/codeql/ruby/Regexp.qll

+/**
+ * An execution of a regular expression by the standard library.
+ */
+private class StdRegexpExecution extends RegexExecution::Range {


No specific suggestions here, but I'm mindful that our regex modelling is becoming a little convoluted. We have modelling for regex literals, for non-literals that get converted to regexes, and for execution of regex literals (but not non-literals). I feel like we could consolidate some of this and avoid having to repeat the same work for each category. For example, in this class we don't handle cases like "foo".match? "fo+" which converts "fo+" to a regex and then executes it, but that is captured by RegExpInterpretation.

You're right, RegexpExecution should probably be expressed in terms of RegExpInterpretation.
I'll look into doing a refactor that can clean things up.

hmac · 2023-01-11T00:01:08Z

ruby/ql/lib/codeql/ruby/security/regexp/MissingFullAnchorCustomizations.qll

+ [ifExpr.getCondition(), ifExpr.getCondition().(Ast::UnaryLogicalOperation).getOperand()] =
+ exec.asExpr().getExpr() and
+ ifExpr.getBranch(_).(Ast::MethodCall).getMethodName() = "raise"
+ )


I was going to suggest that we should use the CFG layer here, but in trying that out I've noticed we seem to have a CFG bug in expressions such as

if foo raise x end

where ExprChildMapping.hasCfgChild doesn't give results for the raise call. I've spent a bit of time on this but haven't got to the bottom of it.

So, nothing for me to do here?

No, nothing to do on this PR I don't think. I will create an issue for us to track this and improve it separately.

One way to express this more abstractly, in terms of the CFG is:

A conditional successor of exec.asExpr() is post-dominated by an abnormal exit node. That is, when taking one of the branches that follow exec.asExpr(), we are guaranteed to exit the enclosing method abnormally.

Note though that this only works if the exception may not be catched by a surrounding catch clause.

erik-krogh · 2023-01-12T09:07:15Z

I'll look into creating the RegexExecution concept in another PR.
My first naive attempt at combining RegexExecution and RegExpInterpretation didn't work, as there is a lot of locations that both accept string/regular-expressions, and I didn't handle that in my first attempt.

I'll work on another PR that lays the ground work, and then come back to this PR.

…sions on library/remote input

erik-krogh · 2023-01-30T20:36:48Z

Lets try again, this time with the refactoring done in a separate PR.

A new evaluation still looks OK.

hmac · 2023-02-03T05:01:41Z

ruby/ql/src/queries/security/cwe-020/MissingFullAnchor.qhelp

+ <overview>
+ <p>
+ Regular expressions in Ruby can use anchors to match the beginning and end of a string. 
+ However, if the <code>^</code> and <code>$</code> anchors are not used, 


Suggested change

However, if the <code>^</code> and <code>$</code> anchors are not used,

However, if the <code>^</code> and <code>$</code> anchors are used,

hmac · 2023-02-03T05:12:56Z

ruby/ql/src/queries/security/cwe-020/MissingFullAnchor.qhelp

+
+ <sample language="ruby">
+def bad(input) 
+ raise "Bad input" unless input =~ /[0-9]+/


Is it not more helpful to show an example that uses ^ and $, since that's what we're looking for in this query?

hmac

A couple of small help-related comments, but otherwise this LGTM. Ping me if/when you need an approval.

sabrowning1

👋🏼 from docs @erik-krogh! Thanks for your work on the docs for this query; I've left a few comments below.

ruby/ql/src/queries/security/cwe-020/MissingFullAnchor.qhelp

sabrowning1 · 2023-02-06T19:03:05Z

ruby/ql/src/queries/security/cwe-020/MissingFullAnchor.qhelp

+ <example>
+
+ <p>
+ The following example code uses a regular expression to check that a string contains only digits.


Suggested change

The following example code uses a regular expression to check that a string contains only digits.

The following (bad) example code uses a regular expression to check that a string contains only digits.

Nit for extra clarity

sabrowning1 · 2023-02-06T19:05:24Z

ruby/ql/src/queries/security/cwe-020/MissingFullAnchor.qhelp

+ <sample language="ruby">
+def bad(input) 
+ raise "Bad input" unless input =~ /^[0-9]+$/
+
+ # ....
+end
+ </sample>


If you could put this code example in an src file, that would be great! Separating it out helps us with maintenance/readability down the line 🙂

sabrowning1 · 2023-02-06T19:05:54Z

ruby/ql/src/queries/security/cwe-020/MissingFullAnchor.qhelp

+ <sample language="ruby">
+def good(input)
+ raise "Bad input" unless input =~ /\A[0-9]+\z/
+
+ # ....
+end
+ </sample>


As above, if those could go in an src file, that would be awesome!

sabrowning1 · 2023-02-06T19:07:37Z

ruby/ql/src/queries/security/cwe-020/MissingFullAnchor.qhelp

+ <p>
+ The regular expression <code>/^[0-9]+$/</code> will match a single line of a multi-line string, 
+ which may not be the intended behavior. 
+ To match the entire string, the regular expression should be <code>\A[0-9]+\z</code>.


Suggested change

To match the entire string, the regular expression should be <code>\A[0-9]+\z</code>.

The following (good) example code uses the regular expression <code>\A[0-9]+\z</code> to match the entire input string.

Small suggestion for styling and clarity

sabrowning1 · 2023-02-06T19:19:02Z

ruby/ql/src/queries/security/cwe-020/MissingFullAnchor.qhelp

+
+ <references>
+ <li>
+ RDoc Documentation: <a href="https://ruby-doc.org/3.2.0/Regexp.html#class-Regexp-label-Anchors">Anchors</a>


Suggested change

RDoc Documentation: <a href="https://ruby-doc.org/3.2.0/Regexp.html#class-Regexp-label-Anchors">Anchors</a>

Ruby documentation: <a href="https://ruby-doc.org/3.2.0/Regexp.html#class-Regexp-label-Anchors">Anchors</a>.

Styling nit

ruby/ql/src/change-notes/2023-01-06-badly-anchored-regex.md

sabrowning1 · 2023-02-06T19:39:27Z

ruby/ql/src/queries/security/cwe-020/MissingFullAnchor.qhelp

+ Use the <code>\A</code> and <code>\z</code> anchors to match the beginning and end of a string, 
+ as these will always match the beginning and end of the string, even if the string contains newlines.


Suggested change

Use the <code>\A</code> and <code>\z</code> anchors to match the beginning and end of a string,

as these will always match the beginning and end of the string, even if the string contains newlines.

Use the <code>\A</code> and <code>\z</code> anchors since these anchors will always match the beginning and end of the string, even if the string contains newlines.

Small suggestion to make this sentence less wordy. Do you think this is still clear enough?

subatoi

Apologies, realised @sabrowning1 had already reviewed so deleted my suggestions :D

mchammer01 · 2023-02-07T06:01:56Z

I'll review this on behalf of Docs later today!

mchammer01 · 2023-02-07T07:50:35Z

Oops, this has already been reviewed by @sabrowning1 (Sam, I'll remove this from our review board).

github-actions · 2023-02-08T09:59:36Z

QHelp previews:

ruby/ql/src/queries/security/cwe-020/MissingFullAnchor.qhelp

Badly anchored regular expression

Regular expressions in Ruby can use anchors to match the beginning and end of a string. However, if the ^ and $ anchors are used, the regular expression can match a single line of a multi-line string. This allows bad actors to bypass your regular expression checks and inject malicious input.

Recommendation

Use the \A and \z anchors since these anchors will always match the beginning and end of the string, even if the string contains newlines.

Example

The following (bad) example code uses a regular expression to check that a string contains only digits.

def bad(input) 
    raise "Bad input" unless input =~ /^[0-9]+$/

    # ....
end

The regular expression /^[0-9]+$/ will match a single line of a multi-line string, which may not be the intended behavior. The following (good) example code uses the regular expression \A[0-9]+\z to match the entire input string.

def good(input)
    raise "Bad input" unless input =~ /\A[0-9]+\z/

    # ....
end

References

Ruby documentation: Anchors
Common Weakness Enumeration: CWE-20.

sabrowning1

Thanks again for your work on the user-facing text @erik-krogh! Once those small tweaks have been applied to the change note, this is good to go for Docs 🙂 🚀

Co-authored-by: Sam Browning <[email protected]>

erik-krogh · 2023-02-08T13:56:22Z

Thanks again for your work on the user-facing text @erik-krogh! Once those small tweaks have been applied to the change note, this is good to go for Docs 🙂 🚀

Ahh. I used the files view, and missed your change-note comments. Thanks 👍

github-actions bot added documentation Ruby labels Jan 5, 2023

github-advanced-security bot found potential problems Jan 5, 2023

View reviewed changes

ruby/ql/lib/codeql/ruby/security/regexp/MissingFullAnchorCustomizations.qll Fixed Show fixed Hide fixed

erik-krogh force-pushed the secondMissAnchor branch 2 times, most recently from 8c9e258 to d79b1f7 Compare January 5, 2023 18:42

erik-krogh marked this pull request as ready for review January 6, 2023 08:26

erik-krogh requested a review from a team as a code owner January 6, 2023 08:26

erik-krogh force-pushed the secondMissAnchor branch from d79b1f7 to 43ff915 Compare January 6, 2023 08:29

calumgrant requested a review from hmac January 9, 2023 09:26

hmac reviewed Jan 10, 2023

View reviewed changes

hmac reviewed Jan 11, 2023

View reviewed changes

erik-krogh marked this pull request as draft January 12, 2023 09:04

erik-krogh force-pushed the secondMissAnchor branch 2 times, most recently from 3b83b83 to f9241b2 Compare January 30, 2023 15:25

erik-krogh added 2 commits January 30, 2023 16:34

add query detecting validators that use badly anchored regular expres…

e010023

…sions on library/remote input

add change-note

31743af

erik-krogh force-pushed the secondMissAnchor branch from f9241b2 to 31743af Compare January 30, 2023 15:34

erik-krogh marked this pull request as ready for review January 30, 2023 20:36

hmac reviewed Feb 3, 2023

View reviewed changes

adjust qhelp based on review

3545bb0

erik-krogh added the ready-for-doc-review This PR requires and is ready for review from the GitHub docs team. label Feb 3, 2023

sabrowning1 reviewed Feb 6, 2023

View reviewed changes

subatoi reviewed Feb 6, 2023

View reviewed changes

mchammer01 self-requested a review February 7, 2023 06:01

mchammer01 removed their request for review February 7, 2023 07:50

erik-krogh removed the ready-for-doc-review This PR requires and is ready for review from the GitHub docs team. label Feb 7, 2023

erik-krogh requested review from sabrowning1 and hmac February 8, 2023 09:59

improve qhelp based on doc review

eb56476

erik-krogh force-pushed the secondMissAnchor branch from f926015 to eb56476 Compare February 8, 2023 10:01

sabrowning1 previously approved these changes Feb 8, 2023

View reviewed changes

apply change-note suggestions from doc review

3ebac65

Co-authored-by: Sam Browning <[email protected]>

erik-krogh dismissed sabrowning1’s stale review via 3ebac65 February 8, 2023 13:55

hmac approved these changes Feb 13, 2023

View reviewed changes

erik-krogh merged commit 26d5fb2 into github:main Feb 13, 2023

	However, if the <code>^</code> and <code>$</code> anchors are not used,
	However, if the <code>^</code> and <code>$</code> anchors are used,

	To match the entire string, the regular expression should be <code>\A[0-9]+\z</code>.
	The following (good) example code uses the regular expression <code>\A[0-9]+\z</code> to match the entire input string.

	RDoc Documentation: <a href="https://ruby-doc.org/3.2.0/Regexp.html#class-Regexp-label-Anchors">Anchors</a>
	Ruby documentation: <a href="https://ruby-doc.org/3.2.0/Regexp.html#class-Regexp-label-Anchors">Anchors</a>.

		Use the <code>\A</code> and <code>\z</code> anchors to match the beginning and end of a string,
		as these will always match the beginning and end of the string, even if the string contains newlines.

RB: add query detecting validators that use badly anchored regular expressions on library/remote input #11824

RB: add query detecting validators that use badly anchored regular expressions on library/remote input #11824

Conversation

erik-krogh commented Jan 5, 2023 • edited Loading

github-actions bot commented Jan 5, 2023 • edited Loading

Badly anchored regular expression

Recommendation

Example

References

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hmac Jan 10, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

erik-krogh commented Jan 12, 2023

erik-krogh commented Jan 30, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hmac left a comment

Choose a reason for hiding this comment

sabrowning1 left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

subatoi left a comment • edited Loading

Choose a reason for hiding this comment

mchammer01 commented Feb 7, 2023

mchammer01 commented Feb 7, 2023

github-actions bot commented Feb 8, 2023 • edited Loading

Badly anchored regular expression

Recommendation

Example

References

sabrowning1 left a comment

Choose a reason for hiding this comment

erik-krogh commented Feb 8, 2023

erik-krogh commented Jan 5, 2023 •

edited

Loading

github-actions bot commented Jan 5, 2023 •

edited

Loading

hmac Jan 10, 2023 •

edited

Loading

erik-krogh commented Jan 30, 2023 •

edited

Loading

sabrowning1 left a comment •

edited

Loading

subatoi left a comment •

edited

Loading

github-actions bot commented Feb 8, 2023 •

edited

Loading