Module TaintTracking
Python Taint Tracking Library
The taint tracking library is described in three parts.
- Specification of kinds, sources, sinks and flows.
- The high level query API
- The implementation.
Specification
There are four parts to the specification of a taint tracking query. These are:
-
Kinds
The Python taint tracking library supports arbitrary kinds of taint. This is useful where you want to track something related to “taint”, but that is in itself not dangerous. For example, we might want to track the flow of request objects. Request objects are not in themselves tainted, but they do contain tainted data. For example, the length or timestamp of a request may not pose a risk, but the GET or POST string probably do. So, we would want to track request objects distinctly from the request data in the GET or POST field.
Kinds can also specify additional flow steps, but we recommend using the
DataFlowExtension
module, which is less likely to cause issues with unwanted recursion. -
Sources
Sources of taint can be added by importing a predefined sub-type of
TaintSource
, or by defining new ones. -
Sinks (or vulnerabilities)
Sinks can be added by importing a predefined sub-type of
TaintSink
, or by defining new ones. -
Flow extensions
Additional flow can be added by importing predefined sub-types of
DataFlowExtension::DataFlowNode
orDataFlowExtension::DataFlowVariable
or by defining new ones.
The high-level query API
The TaintedNode
fully describes the taint flow graph.
The full graph can be expressed as:
from TaintedNode n, TaintedNode s
where s = n.getASuccessor()
select n, s
The source -> sink relation can be expressed either using TaintedNode
:
from TaintedNode src, TaintedNode sink
where src.isSource() and sink.isSink() and src.getASuccessor*() = sink
select src, sink
or, using the specification API:
from TaintSource src, TaintSink sink
where src.flowsToSink(sink)
select src, sink
The implementation
The data-flow graph used by the taint-tracking library is the one created by the points-to analysis,
and consists of the base data-flow graph defined in semmle/python/essa/Essa.qll
enhanced with precise variable flows, call graph and type information.
This graph is then enhanced with additional flows as specified above.
Since the call graph and points-to information is context sensitive, the taint graph must also be context sensitive.
The taint graph is a directed graph where each node consists of a
(CFG node, context, taint)
triple although it could be thought of more naturally
as a number of distinct graphs, one for each input taint-kind consisting of data flow nodes,
(CFG node, context)
pairs, labelled with their taint
.
The TrackedValue
used in the implementation is not the taint kind specified by the user,
but describes both the kind of taint and how that taint relates to any object referred to by a data-flow graph node or edge.
Currently, only two types of taint
are supported: simple taint, where the object is actually tainted;
and attribute taint where a named attribute of the referred object is tainted.
Support for tainted members (both specific members of tuples and the like, and generic members for mutable collections) are likely to be added in the near future and other forms are possible. The types of taints are hard-wired with no user-visible extension method at the moment.
Import path
import semmle.python.dataflow.old.TaintTracking
Imports
Classes
CollectionKind | Taint kinds representing collections of other taint kind. We use |
DictKind | A taint kind representing a mapping of objects to kinds. Typically a dict, but can include other mappings. |
Sanitizer | A type of sanitizer of untrusted data. Examples include sanitizers for http responses, for DB access or for shell commands. Usually a sanitizer can only sanitize data for one particular use. For example, a sanitizer for DB commands would not be safe to use for http responses. |
SequenceKind | A taint kind representing a flat collections of kinds. Typically a sequence, but can include sets. |
TaintKind | A ‘kind’ of taint. This may be almost anything, but it is typically something like a “user-defined string”. Examples include, data from a http request object, data from an SMS or other mobile data source, or, for a super secure system, environment variables or the local file system. |
TaintSink | A node that is vulnerable to one or more types of taint. These nodes provide the sinks when computing the taint flow graph. An example would be an argument to a write to a http response object, such an argument would be vulnerable to unsanitized user-input (XSS). |
TaintSource | A source of taintedness. Users of the taint tracking library should override this class to provide their own sources. |
TaintedDefinition | Warning: Advanced feature. Users are strongly recommended to use |
TaintedPathSink | |
TaintedPathSource |
Modules
DataFlow | Data flow module providing an interface compatible with the other language implementations. |
DataFlowExtension | Extension for data-flow, to help express data-flow paths that are library or framework specific and cannot be inferred by the general data-flow machinery. |
DictKind | |
SequenceKind |
Aliases
FlowLabel | An Alias of |
TaintedNode | A class representing the (node, context, path, kind) tuple. Used for context-sensitive path-aware taint-tracking. |