A (synthetic) data-flow parameter node to capture all positional arguments that
should be passed to the *args parameter.
To handle
def func(*args):
for arg in args:
sink(arg)
func(source1, source2, ...)
we add a synthetic parameter to func that accepts any positional argument at (or
after) the index for the *args parameter. We add a store step (at any list index) to the real
*args parameter. This means we can handle the code above, but if the code had done sink(args[0])
we would (wrongly) add flow for source2 as well.
To solve this more precisely, we could add a synthetic argument with position *args
that had store steps with the correct index (like we do for mapping keyword arguments to a
**kwargs parameter). However, if a single call could go to 2 different
targets with *args parameters at different positions, as in the example below, it’s unclear what
index to store 2 at. For the foo callable it should be 1, for the bar callable it should be 0.
So this information would need to be encoded in the arguments of a ArgumentPosition branch, and
one of the arguments would be which callable is the target. However, we cannot build ArgumentPosition
branches based on the call-graph, so this strategy doesn’t work.
Another approach to solving it precisely is to add multiple synthetic parameters that have store steps
to the real *args parameter. So for the example below, foo would need to have synthetic parameter
nodes for indexes 1 and 2 (which would have store step for index 0 and 1 of the *args parameter),
and bar would need it for indexes 1, 2, and 3. The question becomes how many synthetic parameters to
create, which must be max(Call call, int i | exists(call.getArg(i))), since (again) we can’t base
this on the call-graph. And each function with a *args parameter would need this many extra synthetic
nodes. My gut feeling at that this simple approach will be good enough, but if we need to get it more
precise, it should be possible to do it like this.
In PR review, @yoff suggested an alternative approach for more precise handling:
- At the call site, all positional arguments are stored into a synthetic starArgs argument, always tarting at index 0
- This is sent to a synthetic star parameter
- At the receiving end, we know the offset of a potential real star parameter, so we can define read steps accordingly: In foo, we read from the synthetic star parameter at index 1 and store to the real star parameter at index 0.
def foo(one, *args): ...
def bar(*args): ...
func = foo if <cond> else bar
func(1, 2, 3)
Import path
import semmle.python.dataflow.new.internal.DataFlowPrivateDirect supertypes
Fields
Predicates
| getLocation | Gets the location of this node |
| getParameter | Gets the |
| getScope | Gets the scope of this node. |
| toString | Gets a textual representation of this element. |
Inherited predicates
| asCfgNode | Gets the control-flow node corresponding to this node, if any. | from Node |
| asExpr | Gets the expression corresponding to this node, if any. | from Node |
| getALocalSource | Gets a local source node from which data may flow to this node in zero or more local data-flow steps. | from Node |
| getEnclosingCallable | Gets the enclosing callable of this node. | from Node |
| hasLocationInfo | Holds if this element is at the specified location. The location spans column | from Node |
| isParameterOf | Holds if this node is the parameter of callable | from ParameterNodeImpl |