Modeling data flow in Go libraries¶

When analyzing a Go program, CodeQL does not examine the source code for external packages. To track the flow of untrusted data through a library, you can create a model of the library.

You can find existing models in the go/ql/lib/semmle/go/frameworks/ folder of the CodeQL repository. To add a new model, you should make a new file in that folder, named after the library.

Sources¶

To mark a source of data that is controlled by an untrusted user, we create a class extending RemoteFlowSource::Range. Inheritance and the characteristic predicate of the class should be used to specify exactly the dataflow node that introduces the data. Here is a short example from Mux.qll.

class RequestVars extends DataFlow::RemoteFlowSource::Range, DataFlow::CallNode {
  RequestVars() { this.getTarget().hasQualifiedName("github.com/gorilla/mux", "Vars") }
}

This has the effect that all calls to the function Vars from the package mux are treated as sources of untrusted data.

Flow propagation¶

By default, we assume that all functions in libraries do not have any data flow. To indicate that a particular function does have data flow, create a class extending TaintTracking::FunctionModel (or DataFlow::FunctionModel if the untrusted user data is passed on without being modified).

Inheritance and the characteristic predicate of the class should specify the function. The class should also have a member predicate with the signature override predicate hasTaintFlow(FunctionInput inp, FunctionOutput outp) (or override predicate hasDataFlow(FunctionInput inp, FunctionOutput outp) if extending DataFlow::FunctionModel). The body should constrain inp and outp.

FunctionInput is an abstract representation of the inputs to a function. The options are:

the receiver (inp.isReceiver())
one of the parameters (inp.isParameter(i))
one of the results (inp.isResult(i), or inp.isResult if there is only one result)

Note that it may seem strange that the result of a function could be considered as a function input, but it is needed in some cases. For instance, the function bufio.NewWriter returns a writer bw that buffers write operations to an underlying writer w. If tainted data is written to bw, then it makes sense to propagate that taint back to the underlying writer w, which can be modeled by saying that bufio.NewWriter propagates taint from its result to its first argument.

Similarly, FunctionOutput is an abstract representation of the outputs to a function. The options are:

the receiver (outp.isReceiver())
one of the parameters (outp.isParameter(i))
one of the results (outp.isResult(i), or outp.isResult if there is only one result)

Here is an example from Gin.qll, which has been slightly simplified.

private class ParamsGet extends TaintTracking::FunctionModel, Method {
  ParamsGet() { this.hasQualifiedName("github.com/gin-gonic/gin", "Params", "Get") }

  override predicate hasTaintFlow(FunctionInput inp, FunctionOutput outp) {
    inp.isReceiver() and outp.isResult(0)
  }
}

This has the effect that calls to the Get method with receiver type Params from the gin-gonic/gin package allow taint to flow from the receiver to the first result. In other words, if p has type Params and taint can flow to it, then after the line x := p.Get("foo") taint can also flow to x.

Sanitizers¶

It is not necessary to indicate that library functions are sanitizers. Their bodies are not analyzed, so it is assumed that data does not flow through them.

Sinks¶

Data-flow sinks are specified by queries rather than by library models. However, you can use library models to indicate when functions belong to special categories. Queries can then use these categories when specifying sinks. Classes representing these special categories are contained in go/ql/lib/semmle/go/Concepts.qll in the CodeQL repository. Concepts.qll includes classes for logger mechanisms, HTTP response writers, HTTP redirects, and marshaling and unmarshaling functions.

Here is a short example from Stdlib.qll, which has been slightly simplified.

private class PrintfCall extends LoggerCall::Range, DataFlow::CallNode {
  PrintfCall() { this.getTarget().hasQualifiedName("fmt", ["Print", "Printf", "Println"]) }

  override DataFlow::Node getAMessageComponent() { result = this.getAnArgument() }
}

This has the effect that any call to Print, Printf, or Println in the package fmt is recognized as a logger call. Any query that uses logger calls as a sink will then identify when tainted data has been passed as an argument to Print, Printf, or Println.