CodeQL library for C/C++
codeql/cpp-all 0.12.12-dev (changelog, source)
Search

Module InvalidPointerToDereference

This file provides the second phase of the cpp/invalid-pointer-deref query that identifies flow from the out-of-bounds pointer identified by the AllocationToInvalidPointer.qll library to a dereference of the out-of-bounds pointer.

Consider the following snippet:

1. char* base = (char*)malloc(size);
2. char* end = base + size;
3. for(char *p = base; p <= end; p++) {
4.   use(*p); // BUG: Should have been bounded by `p < end`.
5. }

this file identifies the flow from base + size to end. We call base + size the “dereference source” and end the “dereference sink” (even though end is not actually dereferenced we will use this term because we will perform dataflow to find a use of a pointer x such that x <= end which is dereferenced. In the above example, x is p on line 4).

Merely constructing a pointer that’s out-of-bounds is fine if the pointer is never dereferenced (in reality, the standard only guarantees that it is safe to move the pointer one element past the last element, but we ignore that here). So this step is about identifying which of the out-of-bounds pointers found by pointerAddInstructionHasBounds in AllocationToInvalidPointer.qll are actually being dereferenced. We do this using a regular dataflow configuration (see InvalidPointerToDerefConfig).

The dataflow traversal defines the set of sources as any dataflow node n such that there exists a pointer-arithmetic instruction pai found by AllocationToInvalidPointer.qll and a n.asInstruction() = pai.

The set of sinks is defined as any dataflow node n such that addr <= n.asInstruction() + deltaDerefSinkAndDerefAddress for some address operand addr and constant difference deltaDerefSinkAndDerefAddress. Since an address operand is always consumed by an instruction that performs a dereference this lets us identify a “bad dereference”. We call the instruction that consumes the address operand the “operation”.

For example, consider the flow from base + size to end above. The sink is end on line 3 because p <= end.asInstruction() + deltaDerefSinkAndDerefAddress, where p is the address operand in use(*p) and deltaDerefSinkAndDerefAddress >= 0. The load attached to *p is the “operation”. To ensure that the path makes intuitive sense, we only pick operations that are control-flow reachable from the dereference sink.

We use the deltaDerefSinkAndDerefAddress to compute how many elements the dereference is beyond the end position of the allocation. This is done in the operationIsOffBy predicate (which is the only predicate exposed by this file).

Handling false positives:

Consider the following snippet:

1. char *p = new char[size];
2. char *end = p + size;
3. if (p < end) {
4.   p += 1;
5. }
6. if (p < end) {
7.   int val = *p; // GOOD
8. }

this is safe because p is guarded to be strictly less than end on line 6 before the dereference on line 7. However, if we run the query on the above without further modifications we would see an alert on line 7. This is because range analysis infers that p <= end after the increment on line 4, and thus the result of p += 1 is seen as a valid dereference source. This node then flows to p on line 6 (which is a valid dereference sink since it non-strictly upper bounds an address operand), and range analysis then infers that the address operand of *p (i.e., p) is non-strictly upper bounded by p, and thus reports an alert on line 7.

In order to handle the above false positive, we define a barrier that identifies guards such as p < end that ensures that a value is less than the pointer-arithmetic instruction that computed the invalid pointer. This is done in the InvalidPointerToDerefBarrier module. Since the node we are tracking is not necessarily equal to the pointer-arithmetic instruction, but rather satisfies node.asInstruction() <= pai + deltaDerefSourceAndPai, we need to account for the delta when checking if a guard is sufficiently strong to infer that a future dereference is safe. To do this, we check that the guard guarantees that a node n satisfies n < node + k where node is a node such that node <= pai. Thus, we know that any node m such that m <= n + delta where delta + k <= 0 will be safe because:

m <= n + delta
  <  node + k + delta
  <= pai + k + delta
  <= pai

Import path

import semmle.code.cpp.security.InvalidPointerDereference.InvalidPointerToDereference

Predicates

invalidPointerToDereferenceFieldFlowBranchLimit

Gets the virtual dispatch branching limit when calculating field flow while searching for flow from an out-of-bounds pointer to a dereference of the pointer.

operationIsOffBy

Holds if allocation is the result of an allocation that flows to the left-hand side of pai, and where the right-hand side of pai is an offset such that the result of pai points to an out-of-bounds pointer.