CodeQL documentation

Incomplete regular expression for hostnames

ID: py/incomplete-hostname-regexp
Kind: problem
Severity: warning
Precision: high
   - correctness
   - security
   - external/cwe/cwe-20
Query suites:
   - python-code-scanning.qls
   - python-security-extended.qls
   - python-security-and-quality.qls

Sanitizing untrusted URLs is a common technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.


Escape all meta-characters appropriately when constructing regular expressions for security checks, pay special attention to the . meta-character.


The following example code checks that a URL redirection will reach the domain, or one of its subdomains.

from flask import Flask, request, redirect
import re

app = Flask(__name__)

UNSAFE_REGEX = re.compile("(www|beta)")
SAFE_REGEX = re.compile(r"(www|beta)\.example\.com/")

def unsafe(request):
    target = request.args.get('target', '')
    if UNSAFE_REGEX.match(target):
        return redirect(target)

def safe(request):
    target = request.args.get('target', '')
    if SAFE_REGEX.match(target):
        return redirect(target)

The unsafe check is easy to bypass because the unescaped . allows for any character before, effectively allowing the redirect to go to an attacker-controlled domain such as

The safe check closes this vulnerability by escaping the . so that URLs of the form are rejected.