This library implements the analysis described in the following two papers:
James Kirrage, Asiri Rathnayake, Hayo Thielecke: Static Analysis for Regular Expression Denial-of-Service Attacks. NSS 2013. (http://www.cs.bham.ac.uk/~hxt/research/reg-exp-sec.pdf) Asiri Rathnayake, Hayo Thielecke: Static Analysis for Regular Expression Exponential Runtime via Substructural Logics. 2014. (https://www.cs.bham.ac.uk/~hxt/research/redos_full.pdf)
The basic idea is to search for overlapping cycles in the NFA, that is,
q such that there are two distinct paths from
q to itself
that consume the same word
For any such state
q, an attack string can be constructed as follows:
concatenate a prefix
v that takes the NFA to
n copies of
w that leads back to
q along two different paths, followed
by a suffix
x that is not accepted in state
q. A backtracking
implementation will need to explore at least 2^n different ways of going
q back to itself while trying to match the
n copies of
before finally giving up.
Now in order to identify overlapping cycles, all we have to do is find
pumpable forks, that is, states
q that can transition to two different
r2 on the same input symbol
c, such that there are
paths from both
q that consume the same word. The latter
condition is equivalent to saying that
(q, q) is reachable from
in the product NFA.
This is what the library does. It makes a simple attempt to construct a
v leading into
q, but only to improve the alert message.
And the library tries to prove the existence of a suffix that ensures
rejection. This check might fail, which can cause false positives.
More precisely, the library constructs an NFA from a regular expression
- Every sub-term
tgives rise to an NFA state
Match(t,i), representing the state of the automaton before attempting to match the
ith character in
- There is one accepting state
- There is a special
AcceptAnySuffix(r)state, which accepts any suffix string by using an epsilon transition to
Accept(r)and an any transition to itself.
- Transitions between states may be labelled with epsilon, or an abstract input symbol.
- Each abstract input symbol represents a set of concrete input characters: either a single character, a set of characters represented by a character class, or the set of all characters.
- The product automaton is constructed lazily, starting with pair states
qis a fork, and proceding along an over-approximate step relation.
- The over-approximate step relation allows transitions along pairs of abstract input symbols where the symbols have overlap in the characters they accept.
- Once a trace of pairs of abstract input symbols that leads from a fork back to itself has been identified, we attempt to construct a concrete string corresponding to it, which may fail.
- Lastly we ensure that any state reached by repeating
whas a suffix
x(possible empty) that is most likely not accepted.
Provides classes for working with regular expressions that can perform backtracking in superlinear/exponential time.
An instantiation of