A (synthetic) data-flow parameter node to capture all positional arguments that
should be passed to the *args
parameter.
To handle
def func(*args):
for arg in args:
sink(arg)
func(source1, source2, ...)
we add a synthetic parameter to func
that accepts any positional argument at (or
after) the index for the *args
parameter. We add a store step (at any list index) to the real
*args
parameter. This means we can handle the code above, but if the code had done sink(args[0])
we would (wrongly) add flow for source2
as well.
To solve this more precisely, we could add a synthetic argument with position *args
that had store steps with the correct index (like we do for mapping keyword arguments to a
**kwargs
parameter). However, if a single call could go to 2 different
targets with *args
parameters at different positions, as in the example below, it’s unclear what
index to store 2
at. For the foo
callable it should be 1, for the bar
callable it should be 0.
So this information would need to be encoded in the arguments of a ArgumentPosition
branch, and
one of the arguments would be which callable is the target. However, we cannot build ArgumentPosition
branches based on the call-graph, so this strategy doesn’t work.
Another approach to solving it precisely is to add multiple synthetic parameters that have store steps
to the real *args
parameter. So for the example below, foo
would need to have synthetic parameter
nodes for indexes 1 and 2 (which would have store step for index 0 and 1 of the *args
parameter),
and bar
would need it for indexes 1, 2, and 3. The question becomes how many synthetic parameters to
create, which must be max(Call call, int i | exists(call.getArg(i)))
, since (again) we can’t base
this on the call-graph. And each function with a *args
parameter would need this many extra synthetic
nodes. My gut feeling at that this simple approach will be good enough, but if we need to get it more
precise, it should be possible to do it like this.
In PR review, @yoff suggested an alternative approach for more precise handling:
- At the call site, all positional arguments are stored into a synthetic starArgs argument, always tarting at index 0
- This is sent to a synthetic star parameter
- At the receiving end, we know the offset of a potential real star parameter, so we can define read steps accordingly: In foo, we read from the synthetic star parameter at index 1 and store to the real star parameter at index 0.
def foo(one, *args): ...
def bar(*args): ...
func = foo if <cond> else bar
func(1, 2, 3)
Import path
import semmle.python.dataflow.new.internal.DataFlowPrivate
Direct supertypes
Fields
Predicates
getLocation | Gets the location of this node |
getParameter | Gets the |
getScope | Gets the scope of this node. |
toString | Gets a textual representation of this element. |
Inherited predicates
asCfgNode | Gets the control-flow node corresponding to this node, if any. | from Node |
asExpr | Gets the expression corresponding to this node, if any. | from Node |
getALocalSource | Gets a local source node from which data may flow to this node in zero or more local data-flow steps. | from Node |
getEnclosingCallable | Gets the enclosing callable of this node. | from Node |
hasLocationInfo | Holds if this element is at the specified location. The location spans column | from Node |
isParameterOf | Holds if this node is the parameter of callable | from ParameterNodeImpl |