Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RB: add a RegexExecution concept, and use it for better regexp tracking #11879

Merged
merged 13 commits into from
Jan 30, 2023

Conversation

erik-krogh
Copy link
Contributor

@erik-krogh erik-krogh commented Jan 12, 2023

This PR adds a RegexExecution concept which is identical to the similarly named concept in Python.

I've used that to add tracking from strings/regexps to where they are used, which allows the polynomial-redos query to flag more results because the regular expression can be some distance away from where it's used.
This is similar to how JS already does it, but the implementation is different.
(In Ruby you don't necessarily have to compile a string to a regex before using it as a regex, which is different from JS, so it makes sense to me that the implementations are different).


Commit-by-commit review is recommended, but things got refactored a bunch along the way, so be mindful of that.


There is a tradeoff to be made between performance and results.
At first I tracked the strings/regexps using a DataFlow::Configuration, but that gave a performance regression of at least 5% (nightly, rails-projects), so that seems unacceptable.

I tried to limit fieldFlowBranchLimit() to 1 (down from the default of 2), which improved performance (and had no change in results). But there was still a ~1.5% slowdown on code-scanning (nightly, rails-projects).

I tried changing from a DataFlow::Configuration to a TypeTracker,
which at first had a worst-case slowdown by a factor of 10.
However, I got performance to be great by adding an initial exploratory analysis, which greatly reduced the search-space for the final TypeTracker. It actually results in a 4-5% speedup on the code-scanning suite (nightly, rails-projects).
There are some lost results due to the reduced capabilities of type-tracking compared to a DataFlow::Configuration.
But the only lost results in the evaluations were in test-files, and with a ~5% speedup I think it's worth it.

(The exploratory analysis is a simplified version of what JS does in it's global dataflow library, which I assume is similar to how the shared dataflow library does it).

In the end I think the potential for lost results are worth it, given the ~5% speedup on the entire code-scanning suite.
(A link to an internal note about performance on an internal project)


I could make the efficient the source -> sink analysis using TypeTrackers into a shared library using parameterized modules.
But that would only make sense if the TypeTracker library was first converted to a shared library using parameterized modules, and such a library would likely reuse the signature from a parameterized module for the shared dataflow library.
So I won't look into that any time soon.

@github-actions github-actions bot added the Ruby label Jan 12, 2023
@erik-krogh erik-krogh force-pushed the rbRegConcept branch 4 times, most recently from 06edb2a to 3eaf1b1 Compare January 17, 2023 09:53
@erik-krogh erik-krogh marked this pull request as ready for review January 18, 2023 09:37
@erik-krogh erik-krogh requested a review from a team as a code owner January 18, 2023 09:37
@erik-krogh erik-krogh added the no-change-note-required This PR does not need a change note label Jan 18, 2023
Copy link
Contributor

@aibaars aibaars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

Comment on lines 181 to 187
name = "case-when" and
exec.asExpr() = caseWhen and
input.asExpr() = caseWhen.getValue()
|
regexp.asExpr() = caseWhen.getBranch(_).(CfgNodes::ExprNodes::WhenClauseCfgNode).getPattern(_)
or
regexp.asExpr() = caseWhen.getBranch(_).(CfgNodes::ExprNodes::InClauseCfgNode).getPattern()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
name = "case-when" and
exec.asExpr() = caseWhen and
input.asExpr() = caseWhen.getValue()
|
regexp.asExpr() = caseWhen.getBranch(_).(CfgNodes::ExprNodes::WhenClauseCfgNode).getPattern(_)
or
regexp.asExpr() = caseWhen.getBranch(_).(CfgNodes::ExprNodes::InClauseCfgNode).getPattern()
exec.asExpr() = caseExpr and
input.asExpr() = caseExpr.getValue()
|
name = "case-when" and
regexp.asExpr() = caseExpr.getBranch(_).(CfgNodes::ExprNodes::WhenClauseCfgNode).getPattern(_)
or
name = "case-in" and
regexp.asExpr() = caseExpr.getBranch(_).(CfgNodes::ExprNodes::InClauseCfgNode).getPattern()

* @kind path-problem
*/

import RegExpConfiguration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep this query. It would be interesting to see the DCA diff of this query when switching from dataflow to type tracking.

Copy link
Contributor Author

@erik-krogh erik-krogh Jan 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't keep it as a @kind path-problem, but I can rewrite it a bit to get the same result.

I tested it on our big internal ruby project, and it runs fine.

)
}

/** Gests a node that references a regular expression. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/** Gests a node that references a regular expression. */
/** Gets a node that references a regular expression. */

*
* 2: A precise type tracking analysis that tracks
* strings and regular expressions to the places where they are used.
* This phase keeps track of which strings and regular expressions ends up in which places.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* This phase keeps track of which strings and regular expressions ends up in which places.
* This phase keeps track of which strings and regular expressions end up in which places.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-change-note-required This PR does not need a change note Ruby
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants