Module IterableUnpacking
The unpacking assignment takes the general form
sequence = iterable
where sequence
is either a tuple or a list and it can contain wildcards.
The iterable can be any iterable, which means that (CodeQL modeling of) content
will need to change type if it should be transferred from the LHS to the RHS.
Note that (CodeQL modeling of) content does not have to change type on data-flow paths inside the LHS, as the different allowed syntaxes here are merely a convenience. Consequently, we model all LHS sequences as tuples, which have the more precise content model, making flow to the elements more precise. If an element is a starred variable, we will have to mutate the content type to be list content.
We may for instance have
(a, b) = ["a", SOURCE] # RHS has content `ListElementContent`
Due to the abstraction for list content, we do not know whether SOURCE
ends up in a
or in b
, so we want to overapproximate and see it in both.
Using wildcards we may have
(a, *b) = ("a", "b", SOURCE) # RHS has content `TupleElementContent(2)`
Since the starred variables are always assigned (Python-)type list, *b
will be
["b", SOURCE]
, and we will again overapproximate and assign it
content corresponding to anything found in the RHS.
For a precise transfer
(a, b) = ("a", SOURCE) # RHS has content `TupleElementContent(1)`
we wish to keep the precision, so only b
receives the tuple content at index 1.
Finally, sequence
is actually a pattern and can have a more complicated structure,
such as
(a, [b, *c]) = ("a", ["b", SOURCE]) # RHS has content `TupleElementContent(1); ListElementContent`
where a
should not receive content, but b
and c
should. c
will be [SOURCE]
so
should have the content transferred, while b
should read it.
To transfer content from RHS to the elements of the LHS in the expression sequence = iterable
,
we use two synthetic nodes:
-
TIterableSequence(sequence)
which captures the content-modeling the entiresequence
will have (essentially just a copy of the content-modeling the RHS has) -
TIterableElement(sequence)
which captures the content-modeling that will be assigned to an element. Note that an empty access path means that the value we are tracking flows directly to the element.
The TIterableSequence(sequence)
is at this point superfluous but becomes useful when handling recursive
structures in the LHS, where sequence
is some internal sequence node. We can have a uniform treatment
by always having these two synthetic nodes. So we transfer to (or, in the recursive case, read into)
TIterableSequence(sequence)
, from which we take a read step to TIterableElement(sequence)
and then a
store step to sequence
.
This allows the unknown content from the RHS to be read into TIterableElement(sequence)
and tuple content
to then be stored into sequence
. If the content is already tuple content, this indirection creates crosstalk
between indices. Therefore, tuple content is never read into TIterableElement(sequence)
; it is instead
transferred directly from TIterableSequence(sequence)
to sequence
via a flow step. Such a flow step will
also transfer other content, but only tuple content is further read from sequence
into its elements.
The strategy is then via several read-, store-, and flow steps:
-
a) [Flow] Content is transferred from
iterable
toTIterableSequence(sequence)
via a flow step. From here, everything happens on the LHS.b) [Read] If the unpacking happens inside a for as in
for sequence in iterable
then content is read from
iterable
toTIterableSequence(sequence)
. -
[Flow] Content is transferred from
TIterableSequence(sequence)
tosequence
via a flow step. (Here only tuple content is relevant.) -
[Read] Content is read from
TIterableSequence(sequence)
intoTIterableElement(sequence)
. Assequence
is modeled as a tuple, we will not read tuple content as that would allow crosstalk. -
[Store] Content is stored from
TIterableElement(sequence)
tosequence
. Content type isTupleElementContent
with indices taken from the syntax. For instance, ifsequence
is(a, *b, c)
, content is written to index 0, 1, and 2. This is adequate as the route throughTIterableElement(sequence)
does not transfer precise content. -
[Read] Content is read from
sequence
to its elements. a) If the element is a plain variable, the target is the corresponding essa node.b) If the element is itself a sequence, with control-flow node
seq
, the target isTIterableSequence(seq)
.c) If the element is a starred variable, with control-flow node
v
, the target isTIterableElement(v)
. -
[Store] Content is stored from
TIterableElement(v)
to the essa variable forv
, with content typeListElementContent
. -
[Flow, Read, Store] Steps 2 through 7 are repeated for all recursive elements which are sequences.
We illustrate the above steps on the assignment
(a, b) = ["a", SOURCE]
Looking at the content propagation to a
:
["a", SOURCE]
: [ListElementContent]
–Step 1a–>
TIterableSequence((a, b))
: [ListElementContent]
–Step 3–>
TIterableElement((a, b))
: []
–Step 4–>
(a, b)
: [TupleElementContent(0)]
–Step 5a–>
a
: []
Meaning there is data-flow from the RHS to a
(an over approximation). The same logic would be applied to show there is data-flow to b
. Note that Step 3 and Step 4 would not have been needed if the RHS had been a tuple (since that would have been able to use Step 2 instead).
Another, more complicated example:
(a, [b, *c]) = ["a", [SOURCE]]
where the path to c
is
["a", [SOURCE]]
: [ListElementContent; ListElementContent]
–Step 1a–>
TIterableSequence((a, [b, *c]))
: [ListElementContent; ListElementContent]
–Step 3–>
TIterableElement((a, [b, *c]))
: [ListElementContent]
–Step 4–>
(a, [b, *c])
: [TupleElementContent(1); ListElementContent]
–Step 5b–>
TIterableSequence([b, *c])
: [ListElementContent]
–Step 3–>
TIterableElement([b, *c])
: []
–Step 4–>
[b, *c]
: [TupleElementContent(1)]
–Step 5c–>
TIterableElement(c)
: []
–Step 6–>
c
: [ListElementContent]
Import path
import semmle.python.dataflow.new.internal.IterableUnpacking
Predicates
iterableUnpackingAssignmentFlowStep |
Step 1a Data flows from |
iterableUnpackingConvertingReadStep |
Step 3 Data flows from |
iterableUnpackingConvertingStoreStep |
Step 4 Data flows from |
iterableUnpackingElementReadStep |
Step 5 For a sequence node inside an iterable unpacking, data flows from the sequence to its elements. There are three cases for what |
iterableUnpackingFlowStep |
All flow steps associated with unpacking assignment. |
iterableUnpackingForReadStep |
Step 1b Data is read from |
iterableUnpackingReadStep |
All read steps associated with unpacking assignment. |
iterableUnpackingStarredElementStoreStep |
Step 6 Data flows from |
iterableUnpackingStoreStep |
All store steps associated with unpacking assignment. |
iterableUnpackingTupleFlowStep |
Step 2 Data flows from |
Classes
AssignmentTarget |
The LHS of an assignment, it also records the assigned value. |
ForTarget |
The target of a |
UnpackingAssignmentDirectTarget |
A direct (or top-level) target of an unpacking assignment. |
UnpackingAssignmentSequenceTarget |
A (possibly recursive) target of an unpacking assignment which is also a sequence. |
UnpackingAssignmentTarget |
A (possibly recursive) target of an unpacking assignment. |