-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RB: add a RegexExecution concept, and use it for better regexp tracking #11879
Conversation
06edb2a
to
3eaf1b1
Compare
…eir uses in a nice way in `rb/polynomial-redos`
3eaf1b1
to
49abb5a
Compare
… into RegExpConfiguration.qll"
…exploratory initial analysis
49abb5a
to
1477974
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
ruby/ql/lib/codeql/ruby/Regexp.qll
Outdated
name = "case-when" and | ||
exec.asExpr() = caseWhen and | ||
input.asExpr() = caseWhen.getValue() | ||
| | ||
regexp.asExpr() = caseWhen.getBranch(_).(CfgNodes::ExprNodes::WhenClauseCfgNode).getPattern(_) | ||
or | ||
regexp.asExpr() = caseWhen.getBranch(_).(CfgNodes::ExprNodes::InClauseCfgNode).getPattern() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name = "case-when" and | |
exec.asExpr() = caseWhen and | |
input.asExpr() = caseWhen.getValue() | |
| | |
regexp.asExpr() = caseWhen.getBranch(_).(CfgNodes::ExprNodes::WhenClauseCfgNode).getPattern(_) | |
or | |
regexp.asExpr() = caseWhen.getBranch(_).(CfgNodes::ExprNodes::InClauseCfgNode).getPattern() | |
exec.asExpr() = caseExpr and | |
input.asExpr() = caseExpr.getValue() | |
| | |
name = "case-when" and | |
regexp.asExpr() = caseExpr.getBranch(_).(CfgNodes::ExprNodes::WhenClauseCfgNode).getPattern(_) | |
or | |
name = "case-in" and | |
regexp.asExpr() = caseExpr.getBranch(_).(CfgNodes::ExprNodes::InClauseCfgNode).getPattern() |
* @kind path-problem | ||
*/ | ||
|
||
import RegExpConfiguration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep this query. It would be interesting to see the DCA diff of this query when switching from dataflow to type tracking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't keep it as a @kind path-problem
, but I can rewrite it a bit to get the same result.
I tested it on our big internal ruby project, and it runs fine.
) | ||
} | ||
|
||
/** Gests a node that references a regular expression. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/** Gests a node that references a regular expression. */ | |
/** Gets a node that references a regular expression. */ |
* | ||
* 2: A precise type tracking analysis that tracks | ||
* strings and regular expressions to the places where they are used. | ||
* This phase keeps track of which strings and regular expressions ends up in which places. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* This phase keeps track of which strings and regular expressions ends up in which places. | |
* This phase keeps track of which strings and regular expressions end up in which places. |
This PR adds a
RegexExecution
concept which is identical to the similarly named concept in Python.I've used that to add tracking from strings/regexps to where they are used, which allows the polynomial-redos query to flag more results because the regular expression can be some distance away from where it's used.
This is similar to how JS already does it, but the implementation is different.
(In Ruby you don't necessarily have to compile a string to a regex before using it as a regex, which is different from JS, so it makes sense to me that the implementations are different).
Commit-by-commit review is recommended, but things got refactored a bunch along the way, so be mindful of that.
There is a tradeoff to be made between performance and results.
At first I tracked the strings/regexps using a
DataFlow::Configuration
, but that gave a performance regression of at least 5% (nightly, rails-projects), so that seems unacceptable.I tried to limit
fieldFlowBranchLimit()
to 1 (down from the default of 2), which improved performance (and had no change in results). But there was still a ~1.5% slowdown on code-scanning (nightly, rails-projects).I tried changing from a
DataFlow::Configuration
to aTypeTracker
,which at first had a worst-case slowdown by a factor of 10.
However, I got performance to be great by adding an initial exploratory analysis, which greatly reduced the search-space for the final TypeTracker. It actually results in a 4-5% speedup on the code-scanning suite (nightly, rails-projects).
There are some lost results due to the reduced capabilities of type-tracking compared to a
DataFlow::Configuration
.But the only lost results in the evaluations were in test-files, and with a ~5% speedup I think it's worth it.
(The exploratory analysis is a simplified version of what JS does in it's global dataflow library, which I assume is similar to how the shared dataflow library does it).
In the end I think the potential for lost results are worth it, given the ~5% speedup on the entire code-scanning suite.
(A link to an internal note about performance on an internal project)
I could make the efficient the source -> sink analysis using TypeTrackers into a shared library using parameterized modules.
But that would only make sense if the TypeTracker library was first converted to a shared library using parameterized modules, and such a library would likely reuse the signature from a parameterized module for the shared dataflow library.
So I won't look into that any time soon.