CodeQL library for Python
codeql/python-all 1.0.6 (changelog, source)
Search

Class SynthStarArgsElementParameterNode

A (synthetic) data-flow parameter node to capture all positional arguments that should be passed to the *args parameter.

To handle

def func(*args):
    for arg in args:
        sink(arg)

func(source1, source2, ...)

we add a synthetic parameter to func that accepts any positional argument at (or after) the index for the *args parameter. We add a store step (at any list index) to the real *args parameter. This means we can handle the code above, but if the code had done sink(args[0]) we would (wrongly) add flow for source2 as well.

To solve this more precisely, we could add a synthetic argument with position *args that had store steps with the correct index (like we do for mapping keyword arguments to a **kwargs parameter). However, if a single call could go to 2 different targets with *args parameters at different positions, as in the example below, it’s unclear what index to store 2 at. For the foo callable it should be 1, for the bar callable it should be 0. So this information would need to be encoded in the arguments of a ArgumentPosition branch, and one of the arguments would be which callable is the target. However, we cannot build ArgumentPosition branches based on the call-graph, so this strategy doesn’t work.

Another approach to solving it precisely is to add multiple synthetic parameters that have store steps to the real *args parameter. So for the example below, foo would need to have synthetic parameter nodes for indexes 1 and 2 (which would have store step for index 0 and 1 of the *args parameter), and bar would need it for indexes 1, 2, and 3. The question becomes how many synthetic parameters to create, which must be max(Call call, int i | exists(call.getArg(i))), since (again) we can’t base this on the call-graph. And each function with a *args parameter would need this many extra synthetic nodes. My gut feeling at that this simple approach will be good enough, but if we need to get it more precise, it should be possible to do it like this.

In PR review, @yoff suggested an alternative approach for more precise handling:

  • At the call site, all positional arguments are stored into a synthetic starArgs argument, always tarting at index 0
  • This is sent to a synthetic star parameter
  • At the receiving end, we know the offset of a potential real star parameter, so we can define read steps accordingly: In foo, we read from the synthetic star parameter at index 1 and store to the real star parameter at index 0.
def foo(one, *args): ...
def bar(*args): ...

func = foo if <cond> else bar
func(1, 2, 3)

Import path

import semmle.python.dataflow.new.internal.DataFlowPrivate

Direct supertypes

Indirect supertypes

Fields

Predicates

getLocation

Gets the location of this node

getParameter

Gets the Parameter this ParameterNode represents.

getScope

Gets the scope of this node.

toString

Gets a textual representation of this element.

Inherited predicates

asCfgNode

Gets the control-flow node corresponding to this node, if any.

from Node
asExpr

Gets the expression corresponding to this node, if any.

from Node
getALocalSource

Gets a local source node from which data may flow to this node in zero or more local data-flow steps.

from Node
getEnclosingCallable

Gets the enclosing callable of this node.

from Node
hasLocationInfo

Holds if this element is at the specified location. The location spans column startcolumn of line startline to column endcolumn of line endline in file filepath. For more information, see Locations.

from Node
isParameterOf

Holds if this node is the parameter of callable c at the position ppos.

from ParameterNodeImpl

Charpred