-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Swift: Add regular expression evaluation models for StringProtocol and NSString methods #14383
Conversation
…egex test, so that we can be confident where we fail to identify them.
…g and StringProtocol methods.
…ds from arrays at the sink.
@MathiasVP do you mind reviewing this? Happy to talk you through it. I plan on doing at least one follow-up PR after this to clean up these libraries a bit. So any comments you may have about layout etc will be noted for then. |
I'll take a look at it tomorrow morning. Thanks for the ping! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of small nits, and one question about the design of the new RegexEval
and PotentialRegexEval
split. But otherwise, this LGTM!
DataFlow::Node regexInput; | ||
DataFlow::Node stringInput; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nit: I would have probably kept these as Expr
s to avoid all those asExpr()
calls in the charpred, and then simply implemented getRegexInput
as result.asExpr() = regexInput
.
(and similarly for the other members of this class and related classes.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to move all the interfaces in this library towards DataFlow::Node
rather than a split between DataFlow::Node
and Expr
. I think this is more natural and robust. When we get there a lot of these conversions should disappear (though there will no doubt be some deprecated
stuff for a time).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've now gone ahead with the changes to a DataFlow::Node
-based interface and deprecated the old one. Better now rather than later!
@MathiasVP would you mind having a look at my changes today? I have another branch waiting behind this now. |
Sure. I'll make sure to look at this today. Thanks for the reminder 🙂. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small thing, but other that this LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Add models for the
StringProtocol
andNSString
methods that evaluate regular expressions. These are a bit more complicated than the existing regex evaluation models as these methods take anNSString.CompareOptions
parameter and only interpret their input as a regular expression if.regularExpression
is specified - something like this:Thus the important changes I've made are:
RegexEnableFlagFlow
, tracking the.regularExpression
flag to its use in the method.RegexEval
intoRegexEval
(the class you use for reasoning about regex evals) andPotentialRegexEval
(the class you extend to add models of them). The latter allows data flows such asRegexEnableFlagFlow
to interact with a potential regular expression evaluation site before we figure out whether or not it is a regular expression evaluation site.NSString.CompareOptions
(such as.caseInsensitive
in the example above, and.anchored
which is new), so that we get correct parse modes for the newly modelled methods as we;;.I have also cleaned up the tests a little, added a few more test cases, and cleaned up a few predicates in the regex library.
On the MRVA top 1000 we go from finding 1,104 regex evals to 1,227 - an increase of 123 (11%). A large sample of the new results all look good to me.
I plan to do a brief [internal] presentation about the regex library at some point - reviewers may benefit from that. Alternatively talk to me - this PR came out a bit bigger than I'd like but I'm happy to talk you through each change.