Predicate getCallArg
Gets the argument arg of call at position apos, if any. Requires that we can
resolve call to target with CallType type.
It might seem like it’s enough to know the CallType to resolve arguments. The reason
we also need the target, is to avoid cross-talk. In the example below, assuming
that Foo and Bar define their own meth methods, we might end up passing both
foo and bar to both Foo.meth and Bar.meth, which is wrong. Since the
attribute access uses the same name, we need to also distinguish on the resolved
target, to know which of the two objects to pass as the self argument.
foo = Foo()
bar = Bar()
if cond:
func = foo.meth
else:
func = bar.meth
func(42)
Note: If Bar.meth and Foo.meth resolves to the same function, we will end up
sending both self arguments to that function, which is by definition the right thing to do.
Bound methods
For bound methods, such as bm = x.m; bm(), it’s a little unclear whether we should
still use the object in the attribute lookup (x.m) as the self argument in the
call (bm()). We currently do this, but there might also be cases where we don’t
want to do this.
In the example below, we want to clear taint from the list before it reaches the
sink, but because we don’t have a use of l in the clear() call, we currently
don’t have any way to achieve our goal. (Note that this is a contrived example)
l = list()
clear = l.clear
l.append(tainted)
clear()
sink(l)
To make the above even worse, bound-methods have a __self__ property that refers to
the object of the bound-method, so we can re-write the code as:
l = list()
clear = l.clear
clear.__self__.append(tainted)
clear()
sink(l)
One idea to solve this is to track the object in a synthetic data-flow node every
time the bound method is used, such that the clear() call would essentially be
translated into l.clear(), and we can still have use-use flow.
Import path
import semmle.python.dataflow.new.internal.DataFlowDispatchpredicate getCallArg(CallNode call, Function target, CallType type, Node arg, ArgumentPosition apos)