Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RB: add query detecting validators that use badly anchored regular expressions on library/remote input #11824

Merged
merged 5 commits into from
Feb 13, 2023

Conversation

erik-krogh
Copy link
Contributor

@erik-krogh erik-krogh commented Jan 5, 2023

CVE-2022-31163: TP

I tried to see if flagging all ^...$ regular expressions could work, but I found that to be way too noisy.
So I've restricted the query to look for sanitizer-like (raises an exception) uses of the badly anchored regular expressions, where I can find flow from library-input / remote-flow.

MRVA run on 1000 projects shows a bunch of new results.
The results seem OK, but they're not something that I think requires immediate action.

Evaluation looks OK. Some new results, which seems fine.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 5, 2023

QHelp previews:

ruby/ql/src/queries/security/cwe-020/MissingFullAnchor.qhelp

Badly anchored regular expression

Regular expressions in Ruby can use anchors to match the beginning and end of a string. However, if the ^ and $ anchors are used, the regular expression can match a single line of a multi-line string.

Recommendation

Use the \A and \z anchors to match the beginning and end of a string, as these will always match the beginning and end of the string, even if the string contains newlines.

Example

The following example code uses a regular expression to check that a string contains only digits.

def bad(input) 
    raise "Bad input" unless input =~ /^[0-9]+$/

    # ....
end
        

The regular expression /^[0-9]+$/ will match a single line of a multi-line string, which may not be the intended behavior. To match the entire string, the regular expression should be \A[0-9]+\z.

def good(input)
    raise "Bad input" unless input =~ /\A[0-9]+\z/

    # ....
end
        

References

  • RDoc Documentation: Anchors
  • Common Weakness Enumeration: CWE-20.

@erik-krogh erik-krogh force-pushed the secondMissAnchor branch 2 times, most recently from 8c9e258 to d79b1f7 Compare January 5, 2023 18:42
@erik-krogh erik-krogh marked this pull request as ready for review January 6, 2023 08:26
@erik-krogh erik-krogh requested a review from a team as a code owner January 6, 2023 08:26
@calumgrant calumgrant requested a review from hmac January 9, 2023 09:26
* Extend this class to model new APIs. If you want to refine existing API models,
* extend `RegexExecution` instead.
*/
abstract class Range extends DataFlow::Node {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is almost identical to the Python concept except that theirs has string getName() and is missing RegExpTerm getTerm(). Can we make them the same, so we can share these concepts (maybe not now, but in the future)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Python could use RegExpTerm getTerm() (as a followup after I merge the PR that introduces tracking of string values to regex executions: #11833).

Their use of getName() is for py/regex-injection, which we don't have yet (but we could).
I'll look into that.

/**
* An execution of a regular expression by the standard library.
*/
private class StdRegexpExecution extends RegexExecution::Range {
Copy link
Contributor

@hmac hmac Jan 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No specific suggestions here, but I'm mindful that our regex modelling is becoming a little convoluted. We have modelling for regex literals, for non-literals that get converted to regexes, and for execution of regex literals (but not non-literals). I feel like we could consolidate some of this and avoid having to repeat the same work for each category. For example, in this class we don't handle cases like "foo".match? "fo+" which converts "fo+" to a regex and then executes it, but that is captured by RegExpInterpretation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, RegexpExecution should probably be expressed in terms of RegExpInterpretation.
I'll look into doing a refactor that can clean things up.

[ifExpr.getCondition(), ifExpr.getCondition().(Ast::UnaryLogicalOperation).getOperand()] =
exec.asExpr().getExpr() and
ifExpr.getBranch(_).(Ast::MethodCall).getMethodName() = "raise"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to suggest that we should use the CFG layer here, but in trying that out I've noticed we seem to have a CFG bug in expressions such as

if foo
  raise x
end

where ExprChildMapping.hasCfgChild doesn't give results for the raise call. I've spent a bit of time on this but haven't got to the bottom of it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, nothing for me to do here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, nothing to do on this PR I don't think. I will create an issue for us to track this and improve it separately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way to express this more abstractly, in terms of the CFG is:

A conditional successor of exec.asExpr() is post-dominated by an abnormal exit node. That is, when taking one of the branches that follow exec.asExpr(), we are guaranteed to exit the enclosing method abnormally.

Note though that this only works if the exception may not be catched by a surrounding catch clause.

@erik-krogh erik-krogh marked this pull request as draft January 12, 2023 09:04
@erik-krogh
Copy link
Contributor Author

I'll look into creating the RegexExecution concept in another PR.
My first naive attempt at combining RegexExecution and RegExpInterpretation didn't work, as there is a lot of locations that both accept string/regular-expressions, and I didn't handle that in my first attempt.

I'll work on another PR that lays the ground work, and then come back to this PR.

@erik-krogh erik-krogh force-pushed the secondMissAnchor branch 2 times, most recently from 3b83b83 to f9241b2 Compare January 30, 2023 15:25
@erik-krogh erik-krogh marked this pull request as ready for review January 30, 2023 20:36
@erik-krogh
Copy link
Contributor Author

erik-krogh commented Jan 30, 2023

Lets try again, this time with the refactoring done in a separate PR.

A new evaluation still looks OK.

<overview>
<p>
Regular expressions in Ruby can use anchors to match the beginning and end of a string.
However, if the <code>^</code> and <code>$</code> anchors are not used,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
However, if the <code>^</code> and <code>$</code> anchors are not used,
However, if the <code>^</code> and <code>$</code> anchors are used,


<sample language="ruby">
def bad(input)
raise "Bad input" unless input =~ /[0-9]+/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it not more helpful to show an example that uses ^ and $, since that's what we're looking for in this query?

Copy link
Contributor

@hmac hmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of small help-related comments, but otherwise this LGTM. Ping me if/when you need an approval.

@erik-krogh erik-krogh added the ready-for-doc-review This PR requires and is ready for review from the GitHub docs team. label Feb 3, 2023
Copy link
Contributor

@sabrowning1 sabrowning1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👋🏼 from docs @erik-krogh! Thanks for your work on the docs for this query; I've left a few comments below.

<example>

<p>
The following example code uses a regular expression to check that a string contains only digits.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The following example code uses a regular expression to check that a string contains only digits.
The following (bad) example code uses a regular expression to check that a string contains only digits.

Nit for extra clarity

Comment on lines 27 to 33
<sample language="ruby">
def bad(input)
raise "Bad input" unless input =~ /^[0-9]+$/

# ....
end
</sample>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you could put this code example in an src file, that would be great! Separating it out helps us with maintenance/readability down the line 🙂

Comment on lines 41 to 47
<sample language="ruby">
def good(input)
raise "Bad input" unless input =~ /\A[0-9]+\z/

# ....
end
</sample>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, if those could go in an src file, that would be awesome!

<p>
The regular expression <code>/^[0-9]+$/</code> will match a single line of a multi-line string,
which may not be the intended behavior.
To match the entire string, the regular expression should be <code>\A[0-9]+\z</code>.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To match the entire string, the regular expression should be <code>\A[0-9]+\z</code>.
The following (good) example code uses the regular expression <code>\A[0-9]+\z</code> to match the entire input string.

Small suggestion for styling and clarity


<references>
<li>
RDoc Documentation: <a href="https://ruby-doc.org/3.2.0/Regexp.html#class-Regexp-label-Anchors">Anchors</a>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
RDoc Documentation: <a href="https://ruby-doc.org/3.2.0/Regexp.html#class-Regexp-label-Anchors">Anchors</a>
Ruby documentation: <a href="https://ruby-doc.org/3.2.0/Regexp.html#class-Regexp-label-Anchors">Anchors</a>.

Styling nit

Comment on lines 16 to 17
Use the <code>\A</code> and <code>\z</code> anchors to match the beginning and end of a string,
as these will always match the beginning and end of the string, even if the string contains newlines.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Use the <code>\A</code> and <code>\z</code> anchors to match the beginning and end of a string,
as these will always match the beginning and end of the string, even if the string contains newlines.
Use the <code>\A</code> and <code>\z</code> anchors since these anchors will always match the beginning and end of the string, even if the string contains newlines.

Small suggestion to make this sentence less wordy. Do you think this is still clear enough?

Copy link
Contributor

@subatoi subatoi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies, realised @sabrowning1 had already reviewed so deleted my suggestions :D

@mchammer01 mchammer01 self-requested a review February 7, 2023 06:01
@mchammer01
Copy link
Contributor

I'll review this on behalf of Docs later today!

@mchammer01
Copy link
Contributor

Oops, this has already been reviewed by @sabrowning1 (Sam, I'll remove this from our review board).

@mchammer01 mchammer01 removed their request for review February 7, 2023 07:50
@erik-krogh erik-krogh removed the ready-for-doc-review This PR requires and is ready for review from the GitHub docs team. label Feb 7, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Feb 8, 2023

QHelp previews:

ruby/ql/src/queries/security/cwe-020/MissingFullAnchor.qhelp

Badly anchored regular expression

Regular expressions in Ruby can use anchors to match the beginning and end of a string. However, if the ^ and $ anchors are used, the regular expression can match a single line of a multi-line string. This allows bad actors to bypass your regular expression checks and inject malicious input.

Recommendation

Use the \A and \z anchors since these anchors will always match the beginning and end of the string, even if the string contains newlines.

Example

The following (bad) example code uses a regular expression to check that a string contains only digits.

def bad(input) 
    raise "Bad input" unless input =~ /^[0-9]+$/

    # ....
end

The regular expression /^[0-9]+$/ will match a single line of a multi-line string, which may not be the intended behavior. The following (good) example code uses the regular expression \A[0-9]+\z to match the entire input string.

def good(input)
    raise "Bad input" unless input =~ /\A[0-9]+\z/

    # ....
end

References

  • Ruby documentation: Anchors
  • Common Weakness Enumeration: CWE-20.

sabrowning1
sabrowning1 previously approved these changes Feb 8, 2023
Copy link
Contributor

@sabrowning1 sabrowning1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for your work on the user-facing text @erik-krogh! Once those small tweaks have been applied to the change note, this is good to go for Docs 🙂 🚀

@erik-krogh
Copy link
Contributor Author

Thanks again for your work on the user-facing text @erik-krogh! Once those small tweaks have been applied to the change note, this is good to go for Docs 🙂 🚀

Ahh. I used the files view, and missed your change-note comments. Thanks 👍

@erik-krogh erik-krogh merged commit 26d5fb2 into github:main Feb 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants