Module ExponentialBackTracking
This library implements the analysis described in the following two papers:
James Kirrage, Asiri Rathnayake, Hayo Thielecke: Static Analysis for Regular Expression Denial-of-Service Attacks. NSS 2013. (http://www.cs.bham.ac.uk/~hxt/research/reg-exp-sec.pdf) Asiri Rathnayake, Hayo Thielecke: Static Analysis for Regular Expression Exponential Runtime via Substructural Logics. 2014. (https://www.cs.bham.ac.uk/~hxt/research/redos_full.pdf)
The basic idea is to search for overlapping cycles in the NFA, that is,
states q
such that there are two distinct paths from q
to itself
that consume the same word w
.
For any such state q
, an attack string can be constructed as follows:
concatenate a prefix v
that takes the NFA to q
with n
copies of
the word w
that leads back to q
along two different paths, followed
by a suffix x
that is not accepted in state q
. A backtracking
implementation will need to explore at least 2^n different ways of going
from q
back to itself while trying to match the n
copies of w
before finally giving up.
Now in order to identify overlapping cycles, all we have to do is find
pumpable forks, that is, states q
that can transition to two different
states r1
and r2
on the same input symbol c
, such that there are
paths from both r1
and r2
to q
that consume the same word. The latter
condition is equivalent to saying that (q, q)
is reachable from (r1, r2)
in the product NFA.
This is what the library does. It makes a simple attempt to construct a
prefix v
leading into q
, but only to improve the alert message.
And the library tries to prove the existence of a suffix that ensures
rejection. This check might fail, which can cause false positives.
Finally, sometimes it depends on the translation whether the NFA generated for a regular expression has a pumpable fork or not. We implement one particular translation, which may result in false positives or negatives relative to some particular JavaScript engine.
More precisely, the library constructs an NFA from a regular expression r
as follows:
- Every sub-term
t
gives rise to an NFA stateMatch(t,i)
, representing the state of the automaton before attempting to match thei
th character int
. - There is one accepting state
Accept(r)
. - There is a special
AcceptAnySuffix(r)
state, which accepts any suffix string by using an epsilon transition toAccept(r)
and an any transition to itself. - Transitions between states may be labelled with epsilon, or an abstract input symbol.
- Each abstract input symbol represents a set of concrete input characters: either a single character, a set of characters represented by a character class, or the set of all characters.
- The product automaton is constructed lazily, starting with pair states
(q, q)
whereq
is a fork, and proceeding along an over-approximate step relation. - The over-approximate step relation allows transitions along pairs of abstract input symbols where the symbols have overlap in the characters they accept.
- Once a trace of pairs of abstract input symbols that leads from a fork back to itself has been identified, we attempt to construct a concrete string corresponding to it, which may fail.
- Lastly we ensure that any state reached by repeating
n
copies ofw
has a suffixx
(possible empty) that is most likely not accepted.
Import path
import semmle.code.java.security.regexp.ExponentialBackTracking
Imports
Dep |
A parameterized module implementing the analysis described in the above papers. |