Detecting a potential buffer overflow¶
You can use CodeQL to detect potential buffer overflows by checking for allocations equal to strlen
in C and C++. This topic describes how a C/C++ query for detecting a potential buffer overflow was developed.
Problem—detecting memory allocation that omits space for a null termination character¶
The objective of this query is to detect C/C++ code which allocates an amount of memory equal to the length of a null terminated string, without adding +1 to make room for a null termination character. For example the following code demonstrates this mistake, and results in a buffer overflow:
void processString(const char *input)
{
char *buffer = malloc(strlen(input));
strcpy(buffer, input);
...
}
Basic query¶
Before you can write a query you need to decide what entities to search for and then define how to identify them.
Defining the entities of interest¶
You could approach this problem either by searching for code similar to the call to malloc
in line 3 or the call to strcpy
in line 5 (see example above). For our basic query, we start with a simple assumption: any call to malloc
with only a strlen
to define the memory size is likely to cause an error when the memory is populated.
Calls to strlen
can be identified using the library StrlenCall class, but we need to define a new class to identify calls to malloc
. Both the library class and the new class need to extend the standard class FunctionCall
, with the added restriction of the function name that they apply to:
import cpp
class MallocCall extends FunctionCall
{
MallocCall() { this.getTarget().hasGlobalName("malloc") }
}
Note
You could easily extend this class to include similar functions such as
realloc
, or your own custom allocator. With a little effort they could even include C++new
expressions (to do this,MallocCall
would need to extend a common superclass of bothFunctionCall
andNewExpr
, such asExpr
).
Finding the strlen(string)
pattern¶
Before we start to write our query, there’s one remaining task. We need to modify our new MallocCall
class, so it returns an expression for the size of the allocation. Currently this will be the first argument to the malloc
call, FunctionCall.getArgument(0)
, but converting this into a predicate makes it more flexible for future refinements.
class MallocCall extends FunctionCall
{
MallocCall() { this.getTarget().hasGlobalName("malloc") }
Expr getAllocatedSize() {
result = this.getArgument(0)
}
}
Defining the basic query¶
Now we can write a query using these classes:
import cpp
class MallocCall extends FunctionCall
{
MallocCall() { this.getTarget().hasGlobalName("malloc") }
Expr getAllocatedSize() {
result = this.getArgument(0)
}
}
from MallocCall malloc
where malloc.getAllocatedSize() instanceof StrlenCall
select malloc, "This allocation does not include space to null-terminate the string."
Note that there is no need to check whether anything is added to the strlen
expression, as it would be in the corrected C code malloc(strlen(string) + 1)
. This is because the corrected code would in fact be an AddExpr
containing a StrlenCall
, not an instance of StrlenCall
itself. A side-effect of this approach is that we omit certain unlikely patterns such as malloc(strlen(string) + 0
). In practice we can always come back and extend our query to cover this pattern if it is a concern.
Tip
For some projects, this query may not return any results. Possibly the project you are querying does not have any problems of this kind, but it is also important to make sure the query itself is working properly. One solution is to set up a test project with examples of correct and incorrect code to run the query against (the C code at the very top of this page makes a good starting point). Another approach is to test each part of the query individually to make sure everything is working.
When you have defined the basic query then you can refine the query to include further coding patterns or to exclude false positives:
Improving the query using the ‘SSA’ library¶
The SSA
library represents variables in static single assignment (SSA) form. In this form, each variable is assigned exactly once and every variable is defined before it is used. The use of SSA variables simplifies queries considerably as much of the local data flow analysis has been done for us. For more information, see Static single assignment on Wikipedia.
Including examples where the string size is stored before use¶
The query above works for simple cases, but does not identify a common coding pattern where strlen(string)
is stored in a variable before being passed to malloc
, as in the following example:
int len = strlen(input);
buffer = malloc(len);
To identify this case we can use the standard library SSA.qll
(imported as semmle.code.cpp.controlflow.SSA
).
This library helps us identify where values assigned to local variables may subsequently be used.
For example, consider the following code:
void myFunction(bool condition)
{
const char* x = "alpha"; // definition #1 of x
printf("x = %s\n", x); // use #1 of x
if (condition)
{
x = "beta"; // definition #2 of x
} else {
x = "gamma"; // definition #3 of x
}
printf("x = %s\n", x); // use #2 of x
}
If we run the following query on the code, we get three results:
import cpp
import semmle.code.cpp.controlflow.SSA
from Variable var, Expr defExpr, Expr use
where exists(SsaDefinition ssaDef |
defExpr = ssaDef.getAnUltimateDefiningValue(var)
and use = ssaDef.getAUse(var))
select var, defExpr.getLocation().getStartLine() as dline, use.getLocation().getStartLine() as uline
Results:
var |
dline |
uline |
---|---|---|
x |
3 | 5 |
x |
9 | 14 |
x |
11 | 14 |
It is often useful to also display the defining expression defExpr
, if there is one. For example we might adjust the query above as follows:
import cpp
import semmle.code.cpp.controlflow.SSA
from Variable var, Expr defExpr, Expr use
where exists(SsaDefinition ssaDef |
defExpr = ssaDef.getAnUltimateDefiningValue(var)
and use = ssaDef.getAUse(var))
select var, defExpr.getLocation().getStartLine() as dline, use.getLocation().getStartLine() as uline, defExpr
Now we can see the assigned expression in our results:
var |
dline |
uline |
defExpr |
---|---|---|---|
x |
3 | 5 | alpha |
x |
9 | 14 | beta |
x |
11 | 14 | gamma |
Extending the query to include allocations passed via a variable¶
Using our experiments above we can expand our simple implementation of MallocCall.getAllocatedSize()
. With the following refinement, if the argument is an access to a variable, getAllocatedSize()
returns a value assigned to that variable instead of the variable access itself:
Expr getAllocatedSize() {
if this.getArgument(0) instanceof VariableAccess then
exists(LocalScopeVariable v, SsaDefinition ssaDef |
result = ssaDef.getAnUltimateDefiningValue(v)
and this.getArgument(0) = ssaDef.getAUse(v))
else
result = this.getArgument(0)
}
The completed query will now identify cases where the result of strlen
is stored in a local variable before it is used in a call to malloc
. Here is the query in full:
import cpp
import semmle.code.cpp.controlflow.SSA
class MallocCall extends FunctionCall
{
MallocCall() { this.getTarget().hasGlobalName("malloc") }
Expr getAllocatedSize() {
if this.getArgument(0) instanceof VariableAccess then
exists(LocalScopeVariable v, SsaDefinition ssaDef |
result = ssaDef.getAnUltimateDefiningValue(v)
and this.getArgument(0) = ssaDef.getAUse(v))
else
result = this.getArgument(0)
}
}
from MallocCall malloc
where malloc.getAllocatedSize() instanceof StrlenCall
select malloc, "This allocation does not include space to null-terminate the string."