Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Swift: Add regular expression evaluation models for StringProtocol and NSString methods #14383

Merged
merged 22 commits into from
Oct 27, 2023

Conversation

geoffw0
Copy link
Contributor

@geoffw0 geoffw0 commented Oct 5, 2023

Add models for the StringProtocol and NSString methods that evaluate regular expressions. These are a bit more complicated than the existing regex evaluation models as these methods take an NSString.CompareOptions parameter and only interpret their input as a regular expression if .regularExpression is specified - something like this:

let options: NSString.CompareOptions = [.regularExpression, .caseInsensitive]
_ = NSString(string: "aBcDeF").range(of: "cd.*", options: options)

Thus the important changes I've made are:

  • added RegexEnableFlagFlow, tracking the .regularExpression flag to its use in the method.
  • split RegexEval into RegexEval (the class you use for reasoning about regex evals) and PotentialRegexEval (the class you extend to add models of them). The latter allows data flows such as RegexEnableFlagFlow to interact with a potential regular expression evaluation site before we figure out whether or not it is a regular expression evaluation site.
  • added support for parse mode flags specified via NSString.CompareOptions (such as .caseInsensitive in the example above, and .anchored which is new), so that we get correct parse modes for the newly modelled methods as we;;.

I have also cleaned up the tests a little, added a few more test cases, and cleaned up a few predicates in the regex library.


On the MRVA top 1000 we go from finding 1,104 regex evals to 1,227 - an increase of 123 (11%). A large sample of the new results all look good to me.

I plan to do a brief [internal] presentation about the regex library at some point - reviewers may benefit from that. Alternatively talk to me - this PR came out a bit bigger than I'd like but I'm happy to talk you through each change.

@geoffw0
Copy link
Contributor Author

geoffw0 commented Oct 24, 2023

@MathiasVP do you mind reviewing this? Happy to talk you through it.

I plan on doing at least one follow-up PR after this to clean up these libraries a bit. So any comments you may have about layout etc will be noted for then.

@MathiasVP
Copy link
Contributor

I'll take a look at it tomorrow morning. Thanks for the ping!

Copy link
Contributor

@MathiasVP MathiasVP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of small nits, and one question about the design of the new RegexEval and PotentialRegexEval split. But otherwise, this LGTM!

Comment on lines +315 to +316
DataFlow::Node regexInput;
DataFlow::Node stringInput;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nit: I would have probably kept these as Exprs to avoid all those asExpr() calls in the charpred, and then simply implemented getRegexInput as result.asExpr() = regexInput.

(and similarly for the other members of this class and related classes.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to move all the interfaces in this library towards DataFlow::Node rather than a split between DataFlow::Node and Expr. I think this is more natural and robust. When we get there a lot of these conversions should disappear (though there will no doubt be some deprecated stuff for a time).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've now gone ahead with the changes to a DataFlow::Node-based interface and deprecated the old one. Better now rather than later!

swift/ql/lib/codeql/swift/regex/Regex.qll Outdated Show resolved Hide resolved
swift/ql/lib/codeql/swift/regex/Regex.qll Outdated Show resolved Hide resolved
swift/ql/lib/codeql/swift/regex/Regex.qll Outdated Show resolved Hide resolved
@geoffw0
Copy link
Contributor Author

geoffw0 commented Oct 27, 2023

@MathiasVP would you mind having a look at my changes today? I have another branch waiting behind this now.

@MathiasVP
Copy link
Contributor

@MathiasVP would you mind having a look at my changes today? I have another branch waiting behind this now.

Sure. I'll make sure to look at this today. Thanks for the reminder 🙂.

Copy link
Contributor

@MathiasVP MathiasVP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small thing, but other that this LGTM!

swift/ql/lib/codeql/swift/regex/Regex.qll Fixed Show resolved Hide resolved
Copy link
Contributor

@MathiasVP MathiasVP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@MathiasVP MathiasVP merged commit 6062fbb into github:main Oct 27, 2023
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants