The unpacking assignment takes the general form
sequence = iterable
sequence is either a tuple or a list and it can contain wildcards.
The iterable can be any iterable, which means that (CodeQL modeling of) content
will need to change type if it should be transferred from the LHS to the RHS.
Note that (CodeQL modeling of) content does not have to change type on data-flow paths inside the LHS, as the different allowed syntaxes here are merely a convenience. Consequently, we model all LHS sequences as tuples, which have the more precise content model, making flow to the elements more precise. If an element is a starred variable, we will have to mutate the content type to be list content.
We may for instance have
(a, b) = ["a", SOURCE] # RHS has content `ListElementContent`
Due to the abstraction for list content, we do not know whether
ends up in
a or in
b, so we want to overapproximate and see it in both.
Using wildcards we may have
(a, *b) = ("a", "b", SOURCE) # RHS has content `TupleElementContent(2)`
Since the starred variables are always assigned (Python-)type list,
*b will be
["b", SOURCE], and we will again overapproximate and assign it
content corresponding to anything found in the RHS.
For a precise transfer
(a, b) = ("a", SOURCE) # RHS has content `TupleElementContent(1)`
we wish to keep the precision, so only
b receives the tuple content at index 1.
sequence is actually a pattern and can have a more complicated structure,
(a, [b, *c]) = ("a", ["b", SOURCE]) # RHS has content `TupleElementContent(1); ListElementContent`
a should not receive content, but
c will be
should have the content transferred, while
b should read it.
To transfer content from RHS to the elements of the LHS in the expression
sequence = iterable,
we use two synthetic nodes:
TIterableSequence(sequence)which captures the content-modeling the entire
sequencewill have (essentially just a copy of the content-modeling the RHS has)
TIterableElement(sequence)which captures the content-modeling that will be assigned to an element. Note that an empty access path means that the value we are tracking flows directly to the element.
TIterableSequence(sequence) is at this point superfluous but becomes useful when handling recursive
structures in the LHS, where
sequence is some internal sequence node. We can have a uniform treatment
by always having these two synthetic nodes. So we transfer to (or, in the recursive case, read into)
TIterableSequence(sequence), from which we take a read step to
TIterableElement(sequence) and then a
store step to
This allows the unknown content from the RHS to be read into
TIterableElement(sequence) and tuple content
to then be stored into
sequence. If the content is already tuple content, this indirection creates crosstalk
between indices. Therefore, tuple content is never read into
TIterableElement(sequence); it is instead
transferred directly from
sequence via a flow step. Such a flow step will
also transfer other content, but only tuple content is further read from
sequence into its elements.
The strategy is then via several read-, store-, and flow steps:
a) [Flow] Content is transferred from
TIterableSequence(sequence)via a flow step. From here, everything happens on the LHS.
b) [Read] If the unpacking happens inside a for as in
for sequence in iterable
then content is read from
[Flow] Content is transferred from
sequencevia a flow step. (Here only tuple content is relevant.)
[Read] Content is read from
sequenceis modeled as a tuple, we will not read tuple content as that would allow crosstalk.
[Store] Content is stored from
sequence. Content type is
TupleElementContentwith indices taken from the syntax. For instance, if
(a, *b, c), content is written to index 0, 1, and 2. This is adequate as the route through
TIterableElement(sequence)does not transfer precise content.
[Read] Content is read from
sequenceto its elements. a) If the element is a plain variable, the target is the corresponding essa node.
b) If the element is itself a sequence, with control-flow node
seq, the target is
c) If the element is a starred variable, with control-flow node
v, the target is
[Store] Content is stored from
TIterableElement(v)to the essa variable for
v, with content type
[Flow, Read, Store] Steps 2 through 7 are repeated for all recursive elements which are sequences.
We illustrate the above steps on the assignment
(a, b) = ["a", SOURCE]
Looking at the content propagation to
["a", SOURCE]: [ListElementContent]
TIterableSequence((a, b)): [ListElementContent]
TIterableElement((a, b)): 
(a, b): [TupleElementContent(0)]
Meaning there is data-flow from the RHS to
a (an over approximation). The same logic would be applied to show there is data-flow to
b. Note that Step 3 and Step 4 would not have been needed if the RHS had been a tuple (since that would have been able to use Step 2 instead).
Another, more complicated example:
(a, [b, *c]) = ["a", [SOURCE]]
where the path to
["a", [SOURCE]]: [ListElementContent; ListElementContent]
TIterableSequence((a, [b, *c])): [ListElementContent; ListElementContent]
TIterableElement((a, [b, *c])): [ListElementContent]
(a, [b, *c]): [TupleElementContent(1); ListElementContent]
TIterableSequence([b, *c]): [ListElementContent]
TIterableElement([b, *c]): 
[b, *c]: [TupleElementContent(1)]
Step 1a Data flows from
Step 3 Data flows from
Step 4 Data flows from
Step 5 For a sequence node inside an iterable unpacking, data flows from the sequence to its elements. There are three cases for what
All flow steps associated with unpacking assignment.
Step 1b Data is read from
All read steps associated with unpacking assignment.
Step 6 Data flows from
All store steps associated with unpacking assignment.
Step 2 Data flows from
The LHS of an assignment, it also records the assigned value.
The target of a
A direct (or top-level) target of an unpacking assignment.
A (possibly recursive) target of an unpacking assignment which is also a sequence.
A (possibly recursive) target of an unpacking assignment.