CodeQL documentation

Unsafe expansion of self-closing HTML tag

ID: js/unsafe-html-expansion
Kind: problem
Severity: warning
Precision: very-high
Tags:
   - correctness
   - security
   - external/cwe/cwe-079
   - external/cwe/cwe-116
Query suites:
   - javascript-code-scanning.qls
   - javascript-security-extended.qls
   - javascript-security-and-quality.qls

Click to see the query in the CodeQL repository

Sanitizing untrusted input for HTML meta-characters is a common technique for preventing cross-site scripting attacks. But even a sanitized input can be dangerous to use if it is modified further before a browser treats it as HTML. A seemingly innocent transformation that expands a self-closing HTML tag from <div attr="{sanitized}"/> to <div attr="{sanitized}"></div> may in fact cause cross-site scripting vulnerabilities.

Recommendation

Use a well-tested sanitization library if at all possible, and avoid modifying sanitized values further before treating them as HTML.

An even safer alternative is to design the application so that sanitization is not needed, for instance by using HTML templates that are explicit about the values they treat as HTML.

Example

The following function transforms a self-closing HTML tag to a pair of open/close tags. It does so for all non-img and non-area tags, by using a regular expression with two capture groups. The first capture group corresponds to the name of the tag, and the second capture group to the content of the tag.

function expandSelfClosingTags(html) {
	var rxhtmlTag = /<(?!img|area)(([a-z][^\w\/>]*)[^>]*)\/>/gi;
	return html.replace(rxhtmlTag, "<$1></$2>"); // BAD
}

While it is generally known regular expressions are ill-suited for parsing HTML, variants of this particular transformation pattern have long been considered safe.

However, the function is not safe. As an example, consider the following string:

<div alt="
<x" title="/>
<img src=url404 onerror=alert(1)>"/>

When the above function transforms the string, it becomes a string that results in an alert when a browser treats it as HTML.

<div alt="
<x" title="></x" >
<img src=url404 onerror=alert(1)>"/>

References