CodeQL documentation

Double escaping or unescaping

ID: js/double-escaping
Kind: problem
Severity: warning
Precision: high
Tags:
   - correctness
   - security
   - external/cwe/cwe-116
   - external/cwe/cwe-020
Query suites:
   - javascript-code-scanning.qls
   - javascript-security-extended.qls
   - javascript-security-and-quality.qls

Click to see the query in the CodeQL repository

Escaping meta-characters in untrusted input is an important technique for preventing injection attacks such as cross-site scripting. One particular example of this is HTML entity encoding, where HTML special characters are replaced by HTML character entities to prevent them from being interpreted as HTML markup. For example, the less-than character is encoded as < and the double-quote character as ". Other examples include backslash-escaping for including untrusted data in string literals and percent-encoding for URI components.

The reverse process of replacing escape sequences with the characters they represent is known as unescaping.

Note that the escape characters themselves (such as ampersand in the case of HTML encoding) play a special role during escaping and unescaping: they are themselves escaped, but also form part of the escaped representations of other characters. Hence care must be taken to avoid double escaping and unescaping: when escaping, the escape character must be escaped first, when unescaping it has to be unescaped last.

If used in the context of sanitization, double unescaping may render the sanitization ineffective. Even if it is not used in a security-critical context, it may still result in confusing or garbled output.

Recommendation

Use a (well-tested) sanitization library if at all possible. These libraries are much more likely to handle corner cases correctly than a custom implementation. For URI encoding, you can use the standard encodeURIComponent and decodeURIComponent functions.

Otherwise, make sure to always escape the escape character first, and unescape it last.

Example

The following example shows a pair of hand-written HTML encoding and decoding functions:

module.exports.encode = function(s) {
  return s.replace(/&/g, "&")
          .replace(/"/g, """)
          .replace(/'/g, "'");
};

module.exports.decode = function(s) {
  return s.replace(/&/g, "&")
          .replace(/"/g, "\"")
          .replace(/'/g, "'");
};

The encoding function correctly handles ampersand before the other characters. For example, the string me & "you" is encoded as me & "you", and the string " is encoded as ".

The decoding function, however, incorrectly decodes & into & before handling the other characters. So while it correctly decodes the first example above, it decodes the second example (") to " (a single double quote), which is not correct.

Instead, the decoding function should decode the ampersand last:

module.exports.encode = function(s) {
  return s.replace(/&/g, "&")
          .replace(/"/g, """)
          .replace(/'/g, "'");
};

module.exports.decode = function(s) {
  return s.replace(/"/g, "\"")
          .replace(/'/g, "'")
          .replace(/&/g, "&");
};

References