Data flow cheat sheet for JavaScript¶
This article describes parts of the JavaScript libraries commonly used for variant analysis and in data flow queries.
Taint tracking path queries¶
Use the following template to create a taint tracking path query:
/**
* @kind path-problem
*/
import javascript
import DataFlow
import DataFlow::PathGraph
class MyConfig extends TaintTracking::Configuration {
MyConfig() { this = "MyConfig" }
override predicate isSource(Node node) { ... }
override predicate isSink(Node node) { ... }
override predicate isAdditionalTaintStep(Node pred, Node succ) { ... }
}
from MyConfig cfg, PathNode source, PathNode sink
where cfg.hasFlowPath(source, sink)
select sink.getNode(), source, sink, "taint from $@.", source.getNode(), "here"
This query reports flow paths which:
- Begin at a node matched by isSource.
- Step through variables, function calls, properties, strings, arrays, promises, exceptions, and steps added by isAdditionalTaintStep.
- End at a node matched by isSink.
See also: “Global data flow” and “Creating path queries.”
DataFlow module¶
Use data flow nodes to match program elements independently of syntax. See also: “Analyzing data flow in JavaScript and TypeScript.”
Predicates in the DataFlow::
module:
- moduleImport – finds uses of a module
- moduleMember – finds uses of a module member
- globalVarRef – finds uses of a global variable
Classes and member predicates in the DataFlow::
module:
- Node – something that can have a value, such as an expression, declaration, or SSA variable
- getALocalSource – find the node that this came from
- getTopLevel – top-level scope enclosing this node
- getFile – file containing this node
- getIntValue – value of this node if it’s is an integer constant
- getStringValue – value of this node if it’s is a string constant
- mayHaveBooleanValue – check if the value is
true
orfalse
- SourceNode extends Node – function call, parameter, object creation, or reference to a property or global variable
- getALocalUse – find nodes whose value came from this node
- getACall – find calls with this as the callee
- getAnInstantiation – find
new
-calls with this as the callee - getAnInvocation – find calls or
new
-calls with this as the callee - getAMethodCall – find method calls with this as the receiver
- getAMemberCall – find calls with a member of this as the callee
- getAPropertyRead – find property reads with this as the base
- getAPropertyWrite – find property writes with this as the base
- getAPropertySource – find nodes flowing into a property of this node
- InvokeNode, NewNode, CallNode, MethodCallNode extends SourceNode – call to a function or constructor
- getArgument – an argument to the call
- getCalleeNode – node being invoked as a function
- getCalleeName – name of the variable or property being called
- getOptionArgument – a “named argument” passed in through an object literal
- getCallback – a function passed as a callback
- getACallee - a function being called here
- (MethodCallNode).getMethodName – name of the method being invoked
- (MethodCallNode).getReceiver – receiver of the method call
- FunctionNode extends SourceNode – definition of a function, including closures, methods, and class constructors
- getName – name of the function, derived from a variable or property name
- getParameter – a parameter of the function
- getReceiver – the node representing the value of
this
- getAReturn – get a returned expression
- ParameterNode extends SourceNode – parameter of a function
- getName – the parameter name, if it has one
- ClassNode extends SourceNode – class declaration or function that acts as a class
- getName – name of the class, derived from a variable or property name
- getConstructor – the constructor function
- getInstanceMethod – get an instance method by name
- getStaticMethod – get a static method by name
- getAnInstanceReference – find references to an instance of the class
- getAClassReference – find references to the class itself
- ObjectLiteralNode extends SourceNode – object literal
- getAPropertyWrite – a property in the object literal
- getAPropertySource – value flowing into a property
- ArrayCreationNode extends SourceNode – array literal or call to
Array
constructor - getElement – an element of the array
- ArrayCreationNode extends SourceNode – array literal or call to
- PropRef, PropRead, PropWrite – read or write of a property
- getPropertyName – name of the property, if it is constant
- getPropertyNameExpr – expression holding the name of the property
- getBase – object whose property is accessed
- (PropWrite).getRhs – right-hand side of the property assignment
StringOps module¶
- StringOps::Concatenation – string concatenation, using a plus operator, template literal, or array join call
- StringOps::StartsWith – check if a string starts with something
- StringOps::EndsWith – check if a string ends with something
- StringOps::Includes – check if a string contains something
- StringOps::RegExpTest – check if a string matches a RegExp
Utility¶
- ExtendCall – call that copies properties from one object to another
- JsonParserCall – call that deserializes a JSON string
- JsonStringifyCall – call that serializes a JSON string
- PropertyProjection – call that extracts nested properties by name
System and Network¶
- ClientRequest – outgoing network request
- DatabaseAccess – query being submitted to a database
- FileNameSource – reference to a filename
- FileSystemAccess – file system operation
- FileSystemReadAccess – reading the contents of a file
- FileSystemWriteAccess – writing to the contents of a file
- PersistentReadAccess – reading from persistent storage, like cookies
- PersistentWriteAccess – writing to persistent storage
- SystemCommandExecution – execution of a system command
Untrusted data¶
- RemoteFlowSource – source of untrusted user input
- isUserControlledObject – is the input deserialized to a JSON-like object? (as opposed to just being a string)
- ClientSideRemoteFlowSource extends RemoteFlowSource – input specific to the browser environment
- getKind – is this derived from the
path
,fragment
,query
,url
, orname
?
- getKind – is this derived from the
- HTTP::RequestInputAccess extends RemoteFlowSource – input from an incoming HTTP request
- getKind – is this derived from a
parameter
,header
,body
,url
, orcookie
?
- getKind – is this derived from a
- HTTP::RequestHeaderAccess extends RequestInputAccess – access to a specific header
- getAHeaderName – the name of a header being accessed
Note: some RemoteFlowSource instances, such as input from a web socket, belong to none of the specific subcategories above.
Files¶
- File,
Folder extends
Container – file or folder in the database
- getBaseName – the name of the file or folder
- getRelativePath – path relative to the database root
AST nodes¶
See also: “Abstract syntax tree classes for working with JavaScript and TypeScript programs.”
Conversion between DataFlow and AST nodes:
- Node.asExpr() – convert node to an expression, if possible
- Expr.flow() – convert expression to a node (always possible)
- DataFlow::valueNode – convert expression or declaration to a node
- DataFlow::parameterNode – convert a parameter to a node
- DataFlow::thisNode – get the receiver node of a function
String matching¶
- x.matches(“escape%”) – holds if x starts with “escape”
- x.regexpMatch(“escape.*”) – holds if x starts with “escape”
- x.regexpMatch(“(?i).*escape.*”) – holds if x contains “escape” (case insensitive)
Access paths¶
When multiple property accesses are chained together they form what’s called an “access path”.
To identify nodes based on access paths, use the following predicates in AccessPath module:
- AccessPath::getAReferenceTo – find nodes that refer to the given access path
- AccessPath::getAnAssignmentTo – finds nodes that are assigned to the given access path
- AccessPath::getAnAliasedSourceNode – finds nodes that refer to the same access path
getAReferenceTo
and getAnAssignmentTo
have a 1-argument version for global access paths, and a 2-argument version for access paths starting at a given node.
Type tracking¶
See also: “Using type tracking for API modeling.”
Use the following template to define forward type tracking predicates:
import DataFlow
SourceNode myType(TypeTracker t) {
t.start() and
result = /* SourceNode to track */
or
exists(TypeTracker t2 |
result = myType(t2).track(t2, t)
)
}
SourceNode myType() {
result = myType(TypeTracker::end())
}
Use the following template to define backward type tracking predicates:
import DataFlow
SourceNode myType(TypeBackTracker t) {
t.start() and
result = (/* argument to track */).getALocalSource()
or
exists(TypeBackTracker t2 |
result = myType(t2).backtrack(t2, t)
)
}
SourceNode myType() {
result = myType(TypeBackTracker::end())
}
Troubleshooting¶
- Using a call node as as sink? Try using getArgument to get an argument of the call node instead.
- Trying to use moduleImport or moduleMember as a call node? Try using getACall to get a call to the imported function, instead of the function itself.
- Compilation fails due to incompatible types? Make sure AST nodes and DataFlow nodes are not mixed up. Use asExpr() or flow() to convert.
Further reading¶
- Exploring data flow with path queries in the GitHub documentation.