Uncontrolled data used in path expression¶
ID: py/path-injection Kind: path-problem Severity: error Precision: high Tags: - correctness - security - external/cwe/cwe-022 - external/cwe/cwe-023 - external/cwe/cwe-036 - external/cwe/cwe-073 - external/cwe/cwe-099 Query suites: - python-code-scanning.qls - python-security-extended.qls - python-security-and-quality.qls
Accessing files using paths constructed from user-controlled data can allow an attacker to access unexpected resources. This can result in sensitive information being revealed or deleted, or an attacker being able to influence behavior by modifying unexpected files.
Validate user input before using it to construct a file path, either using an off-the-shelf library function like
werkzeug.utils.secure_filename, or by performing custom validation.
Ideally, follow these rules:
Do not allow more than a single “.” character.
Do not allow directory separators such as “/” or “\” (depending on the file system).
Do not rely on simply replacing problematic sequences such as “../”. For example, after applying this filter to “…/…//”, the resulting string would still be “../”.
Use an allowlist of known good patterns.
In the first example, a file name is read from an HTTP request and then used to access a file. However, a malicious user could enter a file name that is an absolute path, such as
In the second example, it appears that the user is restricted to opening a file within the
"user" home directory. However, a malicious user could enter a file name containing special characters. For example, the string
"../../../etc/passwd" will result in the code reading the file located at
"/server/static/images/../../../etc/passwd", which is the system’s password file. This file would then be sent back to the user, giving them access to all the system’s passwords. Note that a user could also use an absolute path here, since the result of
os.path.join("/server/static/images/", "/etc/passwd") is
In the third example, the path used to access the file system is normalized before being checked against a known prefix. This ensures that regardless of the user input, the resulting path is safe.
import os.path urlpatterns = [ # Route to user_picture url(r'^user-pic1$', user_picture1, name='user-picture1'), url(r'^user-pic2$', user_picture2, name='user-picture2'), url(r'^user-pic3$', user_picture3, name='user-picture3') ] def user_picture1(request): """A view that is vulnerable to malicious file access.""" filename = request.GET.get('p') # BAD: This could read any file on the file system data = open(filename, 'rb').read() return HttpResponse(data) def user_picture2(request): """A view that is vulnerable to malicious file access.""" base_path = '/server/static/images' filename = request.GET.get('p') # BAD: This could still read any file on the file system data = open(os.path.join(base_path, filename), 'rb').read() return HttpResponse(data) def user_picture3(request): """A view that is not vulnerable to malicious file access.""" base_path = '/server/static/images' filename = request.GET.get('p') #GOOD -- Verify with normalised version of path fullpath = os.path.normpath(os.path.join(base_path, filename)) if not fullpath.startswith(base_path): raise SecurityException() data = open(fullpath, 'rb').read() return HttpResponse(data)