CodeQL library for Python
codeql/python-all 3.0.1-dev (changelog, source)
Search

Module SuperlinearBackTracking

This module implements the analysis described in the paper: Valentin Wustholz, Oswaldo Olivo, Marijn J. H. Heule, and Isil Dillig: Static Detection of DoS Vulnerabilities in Programs that use Regular Expressions (Extended Version). (https://arxiv.org/pdf/1701.04045.pdf)

Theorem 3 from the paper describes the basic idea.

The following explains the idea using variables and predicate names that are used in the implementation: We consider a pair of repetitions, which we will call pivot and pumpEnd.

We create a product automaton of 3-tuples of states (see StateTuple). There exists a transition (a,b,c) -> (d,e,f) in the product automaton iff there exists three transitions in the NFA a->d, b->e, c->f where those three transitions all match a shared character char. (see getAThreewayIntersect)

We start a search in the product automaton at (pivot, pivot, pumpEnd), and search for a series of transitions (a Trace), such that we end at (pivot, pumpEnd, pumpEnd) (see isReachableFromStartTuple).

For example, consider the regular expression /^\d*5\w*$/. The search will start at the tuple (\d*, \d*, \w*) and search for a path to (\d*, \w*, \w*). This path exists, and consists of a single transition in the product automaton, where the three corresponding NFA edges all match the character "5".

The start-state in the NFA has an any-transition to itself, this allows us to flag regular expressions such as /a*$/ - which does not have a start anchor - and can thus start matching anywhere.

The implementation is not perfect. It has the same suffix detection issue as the js/redos query, which can cause false positives. It also doesn’t find all transitions in the product automaton, which can cause false negatives.

Import path

import codeql.regex.nfa.SuperlinearBackTracking

Modules

Make

A parameterized module implementing the analysis described in the above papers.