Refining a query to account for edge cases¶
You can improve the results generated by a CodeQL query by adding conditions to remove false positive results caused by common edge cases.
Overview¶
This topic describes how a C++ query was developed. The example introduces recursive predicates and demonstrates the typical workflow used to refine a query. For a full overview of the topics available for learning to write queries for C/C++ code, see “CodeQL for C and C++.”
Finding every private field and checking for initialization¶
Writing a query to check if a constructor initializes all private fields seems like a simple problem, but there are several edge cases to account for.
Basic query¶
We can start by looking at every private field in a class and checking that every constructor in that class initializes them. Once you are familiar with the library for C++ this is not too hard to do.
import cpp
from Constructor c, Field f
where f.getDeclaringType() = c.getDeclaringType() and f.isPrivate()
and not exists(Assignment a | a = f.getAnAssignment() and a.getEnclosingFunction() = c)
select c, "Constructor does not initialize fields $@.", f, f.getName()
f.getDeclaringType() = c.getDeclaringType()
asserts that the field and constructor are both part of the same class.f.isPrivate()
checks if the field is private.not exists(Assignment a | a = f.getAnAssignment() and a.getEnclosingFunction() = c)
checks that there is no assignment to the field in the constructor.
This code looks fairly complete, but when you test it on a project, there are several results that contain examples that we have overlooked.
Refinement 1—excluding fields initialized by lists¶
You may see that the results contain fields that are initialized by constructor initialization lists, instead of by assignment statements. For example, the following class:
class BoxedInt {
public:
BoxedInt(int value) : m_value(value) {}
private:
int m_value;
};
These can be excluded by adding an extra condition to check for this special constructor-only form of assignment.
import cpp
from Constructor c, Field f
where f.getDeclaringType() = c.getDeclaringType() and f.isPrivate()
and not exists(Assignment a | a = f.getAnAssignment() and a.getEnclosingFunction() = c)
// check for constructor initialization lists as well
and not exists(ConstructorFieldInit i | i.getTarget() = f and i.getEnclosingFunction() = c)
select c, "Constructor does not initialize fields $@.", f, f.getName()
Refinement 2—excluding fields initialized by external libraries¶
When you test the revised query, you may discover that fields from classes in external libraries are over-reported. This is often because a header file declares a constructor that is defined in a source file that is not analyzed (external libraries are often excluded from analysis). When the source code is analyzed, the CodeQL database is populated with a Constructor
entry with no body. This constructor
therefore contains no assignments and consequently the query reports that any fields initialized by the constructor are “uninitialized.” There is no particular reason to be suspicious of these cases, and we can exclude them from the results by defining a condition to exclude constructors that have no body:
import cpp
from Constructor c, Field f
where f.getDeclaringType() = c.getDeclaringType() and f.isPrivate()
and not exists(Assignment a | a = f.getAnAssignment() and a.getEnclosingFunction() = c)
// check for constructor initialization lists as well
and not exists(ConstructorFieldInit i | i.getTarget() = f and i.getEnclosingFunction() = c)
// ignore cases where the constructor source code is not available
and exists(c.getBlock())
select c, "Constructor does not initialize fields $@.", f, f.getName()
This is a reasonably precise query—most of the results that it reports are interesting. However, you could make further refinements.
Refinement 3—excluding fields initialized indirectly¶
You may also wish to consider methods called by constructors that assign to the fields, or even to the methods called by those methods. As a concrete example of this, consider the following class.
class BoxedInt {
public:
BoxedInt(int value) {
setValue(value);
}
void setValue(int value) {
m_value = value;
}
private:
int m_value;
};
This case can be excluded by creating a recursive predicate. The recursive predicate is given a function and a field, then checks whether the function assigns to the field. The predicate runs itself on all the functions called by the function that it has been given. By passing the constructor to this predicate, we can check for assignments of a field in all functions called by the constructor, and then do the same for all functions called by those functions all the way down the tree of function calls. For more information, see “Recursion” in the QL language reference.
import cpp
predicate getSubAssignment(Function c, Field f){
exists(Assignment a | a = f.getAnAssignment() and a.getEnclosingFunction() = c)
or exists(Function fun | c.calls(fun) and getSubAssignment(fun, f))
}
from Constructor c, Field f
where f.getDeclaringType() = c.getDeclaringType() and f.isPrivate()
// check for constructor initialization lists as well
and not exists(ConstructorFieldInit i | i.getTarget() = f and i.getEnclosingFunction() = c)
// check for initializations performed indirectly by methods called
// as a result of the constructor being called
and not getSubAssignment(c, f)
// ignore cases where the constructor source code is not available
and exists(c.getBlock())
select c, "Constructor does not initialize fields $@.", f, f.getName()
Refinement 4—simplifying the query¶
Finally we can simplify the query by using the transitive closure operator. In this final version of the query, c.calls*(fun)
resolves to the set of all functions that are c
itself, are called by c
, are called by a function that is called by c
, and so on. This eliminates the need to make a new predicate all together. For more information, see “Transitive closures” in the QL language reference.
import cpp
from Constructor c, Field f
where f.getDeclaringType() = c.getDeclaringType() and f.isPrivate()
// check for constructor initialization lists as well
and not exists(ConstructorFieldInit i | i.getTarget() = f and i.getEnclosingFunction() = c)
// check for initializations performed indirectly by methods called
// as a result of the constructor being called
and not exists(Function fun, Assignment a |
c.calls*(fun) and a = f.getAnAssignment() and a.getEnclosingFunction() = fun)
// ignore cases where the constructor source code is not available
and exists(c.getBlock())
select c, "Constructor does not initialize fields $@.", f, f.getName()