Overly permissive regular expression range¶
ID: java/overly-large-range
Kind: problem
Security severity: 5.0
Severity: warning
Precision: high
Tags:
- correctness
- security
- external/cwe/cwe-020
Query suites:
- java-code-scanning.qls
- java-security-extended.qls
- java-security-and-quality.qls
Click to see the query in the CodeQL repository
It’s easy to write a regular expression range that matches a wider range of characters than you intended. For example, /[a-zA-z]/
matches all lowercase and all uppercase letters, as you would expect, but it also matches the characters: [ \ ] ^ _ `
.
Another common problem is failing to escape the dash character in a regular expression. An unescaped dash is interpreted as part of a range. For example, in the character class [a-zA-Z0-9%=.,-_]
the last character range matches the 55 characters between ,
and _
(both included), which overlaps with the range [0-9]
and is clearly not intended by the writer.
Recommendation¶
Avoid any confusion about which characters are included in the range by writing unambiguous regular expressions. Always check that character ranges match only the expected characters.
Example¶
The following example code is intended to check whether a string is a valid 6 digit hex color.
import java.util.regex.Pattern
public class Tester {
public static boolean is_valid_hex_color(String color) {
return Pattern.matches("#[0-9a-fA-f]{6}", color);
}
}
However, the A-f
range is overly large and matches every uppercase character. It would parse a “color” like #XXYYZZ
as valid.
The fix is to use an uppercase A-F
range instead.
import java.util.regex.Pattern
public class Tester {
public static boolean is_valid_hex_color(String color) {
return Pattern.matches("#[0-9a-fA-F]{6}", color);
}
}
References¶
GitHub Advisory Database: CVE-2021-42740: Improper Neutralization of Special Elements used in a Command in Shell-quote
wh0.github.io: Exploiting CVE-2021-42740
Yosuke Ota: no-obscure-range
Paul Boyd: The regex [,-.]
Common Weakness Enumeration: CWE-20.