Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Java] - Limiting Flows Based on Patterns #18050

Open
KylerKatzUH opened this issue Nov 20, 2024 · 9 comments
Open

[Java] - Limiting Flows Based on Patterns #18050

KylerKatzUH opened this issue Nov 20, 2024 · 9 comments
Labels
question Further information is requested

Comments

@KylerKatzUH
Copy link

Hello, I am trying to restrict flows to only include those that have a source flow that is used as a query parameter.

For example, say authToken is a source,

String urlString = "https://auth.companyportal.com/auth?userId=" + userId + "&token=" + authToken;
URL url = new URL(urlString);
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
connection.setRequestMethod("GET");

However, my current query is picking up false positives where the source isn't used as a query parameter but somehow reaches the sink. Such as a dummy example like this

URL url = new URL(authToken);
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
connection.setRequestMethod("GET");

To address this I added a isValidQueryParamFlow predicate to my query that matches based on ".*\\?.*=.*" however, this causes all of the expected detections to be removed. Even if I remove the regex, or relax the restrictions there still aren't any results. I know the rest of the query is operating as it should since I am getting the expected results without this check. So, I believe it is an issue with how I am performing this filtering.

Here is my full query

 import java
 import semmle.code.java.dataflow.DataFlow
 import semmle.code.java.dataflow.TaintTracking
 import SensitiveInfo.SensitiveInfo
 import Barrier.Barrier
 
 module Flow = TaintTracking::Global<SensitiveInfoToUrlConfig>;
 import Flow::PathGraph
 
 /** A configuration for finding flows from sensitive information sources to URL constructions. */
 module SensitiveInfoToUrlConfig implements DataFlow::ConfigSig {
 
   predicate isSource(DataFlow::Node source) {
     exists(SensitiveVariableExpr sve |  
      source.asExpr() = sve and 
      not sve.toString().toLowerCase().matches("%url%"))
    }

   predicate isSink(DataFlow::Node sink) {
    // Direct use of URL with openConnection followed by setRequestMethod("GET")
    exists(ConstructorCall urlConstructor, MethodCall openConnectionCall, MethodCall setRequestMethod |
      urlConstructor.getConstructedType().hasQualifiedName("java.net", "URL") and
      urlConstructor.getAnArgument() = sink.asExpr() and
      openConnectionCall.getMethod().hasName("openConnection") and
      openConnectionCall.getMethod().getDeclaringType().hasQualifiedName("java.net", "URL") and
      DataFlow::localExprFlow(urlConstructor, openConnectionCall.getQualifier()) and
      setRequestMethod.getMethod().hasName("setRequestMethod") and
      ((StringLiteral)setRequestMethod.getArgument(0)).getValue() = "GET" and
      DataFlow::localExprFlow(openConnectionCall, setRequestMethod.getQualifier())
    )
  }
 
   predicate isBarrier(DataFlow::Node node) {
    Barrier::barrier(node)
   }
 }

 predicate isValidQueryParamFlow(Flow::PathNode source, Flow::PathNode sink) {
  exists(BinaryExpr be |
      be.getOp() = "+" and
      be.getLeftOperand().toString().matches(".*\\?.*=.*") and // Ensure there is a `=` after `?`
      source.getNode().asExpr() = be.getRightOperand() and
      sink.getNode().asExpr() = be
  )
}
 
 from Flow::PathNode source, Flow::PathNode sink
 where Flow::flowPath(source, sink) and
 isValidQueryParamFlow(source, sink)
 select sink.getNode(), source, sink, "Sensitive information used in a URL constructed for a GET request." 
 

Any help is appreciated, thank you,

@KylerKatzUH KylerKatzUH added the question Further information is requested label Nov 20, 2024
@aibaars
Copy link
Contributor

aibaars commented Nov 21, 2024

The predicate isValidQueryParamFlow(Flow::PathNode source, Flow::PathNode sink) requires the source to be an operand of a + expression and the sink to be the result of the + expression. In addition isSource requires the source to be an argument of new URL(..) . That seems overly restrictive. I suppose the query could still find cases like

URL url = new URL("https://auth.companyportal.com/auth?userId=" + userId + "&token=" + authToken);
...

Note: in general it is best to avoid toString() in the logic of queries. That predicate is meant for displaying a short string to a human, the strings are often abbreviations.

@KylerKatzUH
Copy link
Author

Hello @aibaars,

Thank you for your response. I agree it is restrictive, and that there are many more ways that that a URL can be constructed with query params. I am using this as a proof of concept and eventually plan to extend the isValidQueryParamFlow predicate to include more common formats.

For right now, I am confused as to why this query isn't picking up on the simple example that I am testing. Is it correct to do this in the where clause like I have? Or should it be done differently? For example, isBarrier is used to restrict flows that go through something such as an encryption function. With this, I am looking to almost do the opposite and restrict it to only flows that match a certain pattern.

I tried switching my query to use TaintTracking::Configuration so that I can override isAdditionalTaintStep with this, I get the expected result detected, however, the pathgraph fails to be displayed correctly.

Giving me

Showing raw results instead of interpreted ones due to an error. A fatal error occurred: Could not process query metadata.
Error was: Expected result pattern(s) are not present for path-problem query: Expected at least two result patterns. These should include at least an 'edges' result set (see https://codeql.github.com/docs/writing-codeql-queries/creating-path-queries/). [INVALID_RESULT_PATTERNS]

I haven't ever been able to get tainttracking to work without this error which is why I use the DataFlow::ConfigSig instead for all of my dataflow queries. However, that won't let me override anything.

Here is my query using this method for reference

/**
 * @name CWE-598: Use of GET request method with sensitive query strings
 * @description Detects sensitive information being sent in query strings over GET requests, which could be exposed in server logs or browser history.
 * @kind path-problem
 * @problem.severity warning
 * @id java/sensitive-get-query2/598
 * @tags security
 *       external/cwe/cwe-598
 * @cwe CWE-598
 */

 import java
 import semmle.code.java.dataflow.TaintTracking
 import semmle.code.java.dataflow.DataFlow
 
 class SensitiveDataToQueryParam extends TaintTracking::Configuration {
   SensitiveDataToQueryParam() { this = "SensitiveDataToQueryParam" }
 
   override predicate isSource(DataFlow::Node source) {
     exists(VarAccess v |
       source.asExpr() = v and
       v.toString() = "cardNumber"
     )
   }
 
   override predicate isSink(DataFlow::Node sink) {
     exists(ConstructorCall urlConstructor, MethodCall openConnectionCall, MethodCall setRequestMethod |
       urlConstructor.getConstructedType().hasQualifiedName("java.net", "URL") and
       urlConstructor.getArgument(0) = sink.asExpr() and
       openConnectionCall.getMethod().hasName("openConnection") and
       DataFlow::localExprFlow(urlConstructor, openConnectionCall.getQualifier()) and
       setRequestMethod.getMethod().hasName("setRequestMethod") and
       setRequestMethod.getArgument(0) instanceof StringLiteral and
       ((StringLiteral)setRequestMethod.getArgument(0)).getValue() = "GET" and
       DataFlow::localExprFlow(openConnectionCall, setRequestMethod.getQualifier())
     )
   }
  
   override predicate isAdditionalTaintStep(DataFlow::Node src, DataFlow::Node dest) {
     exists(BinaryExpr binOp |
       binOp.getOp() = "+" and
       binOp = dest.asExpr() and
       (
         (binOp.getLeftOperand() = src.asExpr() and isQueryParamString(binOp.getRightOperand())) or
         (binOp.getRightOperand() = src.asExpr() and isQueryParamString(binOp.getLeftOperand()))
       )
     )
   }
 
   predicate isQueryParamString(Expr expr) {
     expr instanceof StringLiteral and
     ((StringLiteral)expr).getValue().matches("%=%")
   }
 }
 
 from SensitiveDataToQueryParam t, DataFlow::PathNode sourceNode, DataFlow::PathNode sinkNode
 where t.hasFlowPath(sourceNode, sinkNode)
 select sinkNode.getNode(),
   sourceNode.getNode(),
   sinkNode.getNode(),
   sourceNode,
   sinkNode,
   "Sensitive information used in a URL constructed for a GET request."
 

@aibaars
Copy link
Contributor

aibaars commented Nov 22, 2024

Giving me

Showing raw results instead of interpreted ones due to an error. A fatal error occurred: Could not process query metadata.
Error was: Expected result pattern(s) are not present for path-problem query: Expected at least two result patterns. These should include at least an 'edges' result set (see https://codeql.github.com/docs/writing-codeql-queries/creating-path-queries/). [INVALID_RESULT_PATTERNS]

That error means the path graph is missing . I think you need to add

import TaintedPathFlow::PathGraph

@KylerKatzUH
Copy link
Author

Hi @aibaars

I tried

import TaintedPathFlow::PathGraph

However, it says that it's not defined, I am using version 2.17 so I am not sure if it was recently added. However, I can't upgrade due to legacy queries.

With that said, I almost have it working with DataFlow::ConfigSig. After some testing, I figured out that the regex wasn't working correctly, so I have now fixed it. The issue I am facing now it getting it to consider intermediate nodes in the flow.

Using isValidQueryParamFlow(source, sink) Means that it will only consider direct concatenations in the sink such as

URL url = new URL("https://api.companynetwork.com/private/data?apikey=" + authToken);

This means that indirect uses like this aren't detected

String url = "https://api.companynetwork.com/private/data?apikey=" + authToken;
URL url = new URL(url);

I tried recursively traversing back up the flow path using the getASuccessor() predicate. However, this ends up causing false positives.

Recursive attempt.


predicate isQueryParamNode(Flow::PathNode node) {
  exists(AddExpr concatOp, Expr param |
    (param = concatOp.getLeftOperand() or param = concatOp.getRightOperand()) and
    (param.toString().indexOf("?") >= 0 or param.toString().indexOf("&") >= 0) and
    param.toString().indexOf("=") > 0 and
    node.getNode().asExpr() = concatOp
  )
}

/**
 * Walks backward from sink to source, validating if any node matches a query parameter concatenation.
 */
predicate isValidQueryParamFlow(Flow::PathNode source, Flow::PathNode sink) {
  // Start at the sink and walk backward
  exists(Flow::PathNode current |
    // Initialize the traversal at the sink
    sink = current or
    // Traverse backward through successors
    current.getASuccessor() = sink and
    (
      // Check if the current node matches a query parameter
      isQueryParamNode(current) or
      // Continue traversing backward until the source is reached
      isValidQueryParamFlow(source, current)
    )
  )
}

Full query without recursion

import java
 import semmle.code.java.dataflow.DataFlow
 import semmle.code.java.dataflow.TaintTracking
 import SensitiveInfo.SensitiveInfo
 import Barrier.Barrier
 
 module Flow = TaintTracking::Global<SensitiveInfoToUrlConfig>;
 import Flow::PathGraph
 
 /** A configuration for finding flows from sensitive information sources to URL constructions. */
 module SensitiveInfoToUrlConfig implements DataFlow::ConfigSig {
 
   predicate isSource(DataFlow::Node source) {
     exists(SensitiveVariableExpr sve |  
      source.asExpr() = sve and 
      not sve.toString().toLowerCase().matches("%url%"))
    }

   predicate isSink(DataFlow::Node sink) {
    // Direct use of URL with openConnection followed by setRequestMethod("GET")
    exists(ConstructorCall urlConstructor, MethodCall openConnectionCall, MethodCall setRequestMethod |
      urlConstructor.getConstructedType().hasQualifiedName("java.net", "URL") and
      urlConstructor.getAnArgument() = sink.asExpr() and
      openConnectionCall.getMethod().hasName("openConnection") and
      openConnectionCall.getMethod().getDeclaringType().hasQualifiedName("java.net", "URL") and
      DataFlow::localExprFlow(urlConstructor, openConnectionCall.getQualifier()) and
      setRequestMethod.getMethod().hasName("setRequestMethod") and
      ((StringLiteral)setRequestMethod.getArgument(0)).getValue() = "GET" and
      DataFlow::localExprFlow(openConnectionCall, setRequestMethod.getQualifier())
    )
  }
 
   predicate isBarrier(DataFlow::Node node) {
    Barrier::barrier(node)
   }
 }
predicate isValidQueryParamFlow(Flow::PathNode source, Flow::PathNode sink) {
  exists(AddExpr concatOp, Expr param |
    // Check if param is part of the concatenation
    (param = concatOp.getLeftOperand() or param = concatOp.getRightOperand()) and
    // Ensure the string contains `?` or `&`, and also `=`
    (param.toString().indexOf("?") >= 0 or param.toString().indexOf("&") >= 0) and
    param.toString().indexOf("=") > 0 and
    // Map source and sink to the concatenation components
    source.getNode().asExpr() = concatOp.getRightOperand() and
    sink.getNode().asExpr() = concatOp
  )
}

 
 from Flow::PathNode source, Flow::PathNode sink
 where Flow::flowPath(source, sink) and
 isValidQueryParamFlow(source, sink)
 select sink.getNode(), source, sink, "Sensitive information used in a URL constructed for a GET request." 

Thank you, for any help

@aibaars
Copy link
Contributor

aibaars commented Nov 23, 2024

I tried

import TaintedPathFlow::PathGraph

However, it says that it's not defined, I am using version 2.17 so I am not sure if it was recently added. However, I can't upgrade due to legacy queries.

No, I think I just remembered it wrong. In one of the versions of the query tt looks like you were missing

 module Flow = TaintTracking::Global<SensitiveInfoToUrlConfig>;
 import Flow::PathGraph

@aibaars
Copy link
Contributor

aibaars commented Nov 24, 2024

I tried recursively traversing back up the flow path using the getASuccessor() predicate. However, this ends up causing false positives.

I guess what you really want is to find a path from a sensitive variable to a query parameter, and then another path from the query parameter to the url connection sink. You can use flow labels for this https://codeql.github.com/docs/codeql-language-guides/using-flow-labels-for-precise-data-flow-analysis/ .

Another simpler solution you might want to try first is to use DataFlow::localFlow. For example you could restrict the isSource predicate to only consider sensitive variables that locally (ie in the same function body) flow into query parameter. Or alternatively you could flag up query parameters as sink that locally flow into a URL connection. Which approach is best depends on whether you think that query parameters are typically in the same function as the sensitive variable, or if it is more likely that the query parameter is in the same function as the URL connection. The approach using flow labels allows the sensitive variable, the query parameter and the URL connection all to be in different functions. So that approach is more general, but it is also more complicated.

@KylerKatzUH
Copy link
Author

Hello @aibaars

I don't believe that Java has flow labels. However, I think that flow state does the same thing. From how I understand, this will restrict the overall flow to just flows that end up in every state. Which sounds like what I am looking for.

Below is my query adjusted to use flow state, however, I am not getting any results. I have confirmed that each individual flow step works correctly, by breaking them into these smaller queries.

State 1: Verify Sources

import java
import SensitiveInfo.SensitiveInfo

from DataFlow::Node source
where exists(SensitiveVariableExpr sve |
  source.asExpr() = sve and
  not sve.toString().toLowerCase().matches("%url%")
)
select source, "Detected sensitive source."

State 2: Verify Intermediate Flows

import java
import semmle.code.java.dataflow.DataFlow

from DataFlow::Node source, DataFlow::Node queryParam
where exists(AddExpr concatOp, Expr param |
  (param = concatOp.getLeftOperand() or param = concatOp.getRightOperand()) and
  (param.toString().indexOf("?") >= 0 or param.toString().indexOf("&") >= 0) and
  param.toString().indexOf("=") > 0 and
  source.asExpr() = param and
  queryParam.asExpr() = concatOp
)
select source, queryParam, "Sensitive source flows into query parameter."

State 3: Verify Sinks

import java

from DataFlow::Node sink
where exists(ConstructorCall urlConstructor, MethodCall openConnectionCall, MethodCall setRequestMethod |
  urlConstructor.getConstructedType().hasQualifiedName("java.net", "URL") and
  urlConstructor.getAnArgument() = sink.asExpr() and
  openConnectionCall.getMethod().hasName("openConnection") and
  openConnectionCall.getMethod().getDeclaringType().hasQualifiedName("java.net", "URL") and
  DataFlow::localExprFlow(urlConstructor, openConnectionCall.getQualifier()) and
  setRequestMethod.getMethod().hasName("setRequestMethod") and
  ((StringLiteral)setRequestMethod.getArgument(0)).getValue() = "GET" and
  DataFlow::localExprFlow(openConnectionCall, setRequestMethod.getQualifier())
)
select sink, "Detected GET request sink."

Main Query

/**
 * @name CWE-598: Sensitive Information in Query String in GET Request
 * @description Detects sensitive information being used as query parameters in URLs for HTTP GET requests.
 * @kind path-problem
 * @problem.severity error
 * @id java/http-query-sensitive-info/598
 * @tags security
 *       external/cwe/cwe-598
 * @cwe CWE-598
 */

 import java
 import semmle.code.java.dataflow.TaintTracking
 import semmle.code.java.dataflow.DataFlow
 import DataFlow::PathGraph
 import SensitiveInfo.SensitiveInfo
 import Barrier.Barrier
 
 // Define flow states
 class SensitiveState extends DataFlow::FlowState { SensitiveState() { this = "SensitiveState" } }
 class QueryParamState extends DataFlow::FlowState { QueryParamState() { this = "QueryParamState" } }
 class GetRequestState extends DataFlow::FlowState { GetRequestState() { this = "GetRequestState" } }
 
 // Dataflow configuration
 class SensitiveToUrlConfig extends TaintTracking::Configuration {
   SensitiveToUrlConfig() { this = "SensitiveToUrlConfig" }
 
   // Identify sensitive sources in SensitiveState
   override predicate isSource(DataFlow::Node source, DataFlow::FlowState state) {
     state instanceof SensitiveState and
     exists(SensitiveVariableExpr sve |
       source.asExpr() = sve and
       not sve.toString().toLowerCase().matches("%url%")
     )
   }
 
   // Identify GET request sinks in GetRequestState
   override predicate isSink(DataFlow::Node sink, DataFlow::FlowState state) {
     state instanceof GetRequestState and
     exists(ConstructorCall urlConstructor, MethodCall openConnectionCall, MethodCall setRequestMethod |
       urlConstructor.getConstructedType().hasQualifiedName("java.net", "URL") and
       urlConstructor.getAnArgument() = sink.asExpr() and
       openConnectionCall.getMethod().hasName("openConnection") and
       openConnectionCall.getMethod().getDeclaringType().hasQualifiedName("java.net", "URL") and
       DataFlow::localExprFlow(urlConstructor, openConnectionCall.getQualifier()) and
       setRequestMethod.getMethod().hasName("setRequestMethod") and
       ((StringLiteral)setRequestMethod.getArgument(0)).getValue() = "GET" and
       DataFlow::localExprFlow(openConnectionCall, setRequestMethod.getQualifier())
     )
   }
 
   // Define transitions between flow states
   override predicate isAdditionalTaintStep(
     DataFlow::Node node1, DataFlow::FlowState state1,
     DataFlow::Node node2, DataFlow::FlowState state2
   ) {
     // Transition from SensitiveState to QueryParamState
     state1 instanceof SensitiveState and
     state2 instanceof QueryParamState and
     exists(AddExpr concatOp, Expr param |
       (param = concatOp.getLeftOperand() or param = concatOp.getRightOperand()) and
       (param.toString().indexOf("?") >= 0 or param.toString().indexOf("&") >= 0) and
       param.toString().indexOf("=") > 0 and
       node1.asExpr() = param and
       node2.asExpr() = concatOp
     ) or
 
     // Transition from QueryParamState to GetRequestState
     state1 instanceof QueryParamState and
     state2 instanceof GetRequestState and
     exists(ConstructorCall urlConstructor |
       urlConstructor.getAnArgument() = node1.asExpr() and
       node2.asExpr() = urlConstructor
     )
   }
 
   // Barriers block the flow
   override predicate isSanitizer(DataFlow::Node node) {
     Barrier::barrier(node)
   }
 }
 
 // Main query for detecting the flow
 from DataFlow::PathNode source, DataFlow::PathNode sink, SensitiveToUrlConfig cfg
 where cfg.hasFlowPath(source, sink)
 select sink, source, sink, "Sensitive information flows through query parameters into a GET request."

Thank you for your help.

@intrigus-lgtm
Copy link
Contributor

intrigus-lgtm commented Nov 25, 2024

Have you checked whether your isAdditionalTaintStep predicate also returns what you expect?
If you didn't already, I'd recommend trying to quick-eval it.

@KylerKatzUH
Copy link
Author

Hello @intrigus-lgtm ,

It won't let me do a quick eval on that predicate, However, I have tried to cover it here

import java
import semmle.code.java.dataflow.DataFlow

from DataFlow::Node source, DataFlow::Node queryParam
where exists(AddExpr concatOp, Expr param |
  (param = concatOp.getLeftOperand() or param = concatOp.getRightOperand()) and
  (param.toString().indexOf("?") >= 0 or param.toString().indexOf("&") >= 0) and
  param.toString().indexOf("=") > 0 and
  source.asExpr() = param and
  queryParam.asExpr() = concatOp
)
select source, queryParam, "Sensitive source flows into query parameter."

I believe that the issue has something to do with this, maybe the flow from one of the states is not working as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants