CodeQL library for JavaScript¶
When you’re analyzing a JavaScript program, you can make use of the large collection of classes in the CodeQL library for JavaScript.
Overview¶
There is an extensive CodeQL library for analyzing JavaScript code. The classes in this library present the data from a CodeQL database in an object-oriented form and provide abstractions and predicates to help you with common analysis tasks.
The library is implemented as a set of QL modules, that is, files with the extension .qll
. The module javascript.qll
imports most other standard library modules, so you can include the complete library by beginning your query with:
import javascript
The rest of this tutorial briefly summarizes the most important classes and predicates provided by this library, including references to the detailed API documentation where applicable.
Introducing the library¶
The CodeQL library for JavaScript presents information about JavaScript source code at different levels:
- Textual — classes that represent source code as unstructured text files
- Lexical — classes that represent source code as a series of tokens and comments
- Syntactic — classes that represent source code as an abstract syntax tree
- Name binding — classes that represent scopes and variables
- Control flow — classes that represent the flow of control during execution
- Data flow — classes that you can use to reason about data flow in JavaScript source code
- Type inference — classes that you can use to approximate types for JavaScript expressions and variables
- Call graph — classes that represent the caller-callee relationship between functions
- Inter-procedural data flow — classes that you can use to define inter-procedural data flow and taint tracking analyses
- Frameworks — classes that represent source code entities that have a special meaning to JavaScript tools and frameworks
Note that representations above the textual level (for example the lexical representation or the flow graphs) are only available for JavaScript code that does not contain fatal syntax errors. For code with such errors, the only information available is at the textual level, as well as information about the errors themselves.
Additionally, there is library support for working with HTML documents, JSON, and YAML data, JSDoc comments, and regular expressions.
Textual level¶
At its most basic level, a JavaScript code base can simply be viewed as a collection of files organized into folders, where each file is composed of zero or more lines of text.
Note that the textual content of a program is not included in the CodeQL database unless you specifically request it during extraction.
Files and folders¶
In the CodeQL libraries, files are represented as entities of class File, and folders as entities of class Folder, both of which are subclasses of class Container.
Class Container provides the following member predicates:
Container.getParentContainer()
returns the parent folder of the file or folder.Container.getAFile()
returns a file within the folder.Container.getAFolder()
returns a folder nested within the folder.
Note that while getAFile
and getAFolder
are declared on class Container, they currently only have results for Folders.
Both files and folders have paths, which can be accessed by the predicate Container.getAbsolutePath()
. For example, if f
represents a file with the path /home/user/project/src/index.js
, then f.getAbsolutePath()
evaluates to the string "/home/user/project/src/index.js"
, while f.getParentContainer().getAbsolutePath()
returns "/home/user/project/src"
.
These paths are absolute file system paths. If you want to obtain the path of a file relative to the source location in the CodeQL database, use Container.getRelativePath()
instead. Note, however, that a database may contain files that are not located underneath the source location; for such files, getRelativePath()
will not return anything.
The following member predicates of class Container provide more information about the name of a file or folder:
Container.getBaseName()
returns the base name of a file or folder, not including its parent folder, but including its extension. In the above example,f.getBaseName()
would return the string"index.js"
.Container.getStem()
is similar toContainer.getBaseName()
, but it does not include the file extension; sof.getStem()
returns"index"
.Container.getExtension()
returns the file extension, not including the dot; sof.getExtension()
returns"js"
.
For example, the following query computes, for each folder, the number of JavaScript files (that is, files with extension js
) contained in the folder:
import javascript
from Folder d
select d.getRelativePath(), count(File f | f = d.getAFile() and f.getExtension() = "js")
When you run the query on most projects, the results include folders that contain files with a js
extension and folders that don’t.
Locations¶
Most entities in a CodeQL database have an associated source location. Locations are identified by five pieces of information: a file, a start line, a start column, an end line, and an end column. Line and column counts are 1-based (so the first character of a file is at line 1, column 1), and the end position is inclusive.
All entities associated with a source location belong to the class Locatable. The location itself is modeled by the class Location and can be accessed through the member predicate Locatable.getLocation()
. The Location class provides the following member predicates:
Location.getFile()
,Location.getStartLine()
,Location.getStartColumn()
,Location.getEndLine()
,Location.getEndColumn()
return detailed information about the location.Location.getNumLines()
returns the number of (whole or partial) lines covered by the location.Location.startsBefore(Location)
andLocation.endsAfter(Location)
determine whether one location starts before or ends after another location.Location.contains(Location)
indicates whether one location completely contains another location;l1.contains(l2)
holds if, and only if,l1.startsBefore(l2)
andl1.endsAfter(l2)
.
Lines¶
Lines of text in files are represented by the class Line. This class offers the following member predicates:
Line.getText()
returns the text of the line, excluding any terminating newline characters.Line.getTerminator()
returns the terminator character(s) of the line. The last line in a file may not have any terminator characters, in which case this predicate does not return anything; otherwise it returns either the two-character string"\r\n"
(carriage-return followed by newline), or one of the one-character strings"\n"
(newline),"\r"
(carriage-return),"\u2028"
(Unicode character LINE SEPARATOR),"\u2029"
(Unicode character PARAGRAPH SEPARATOR).
Note that, as mentioned above, the textual representation of the program is not included in the CodeQL database by default.
Lexical level¶
A slightly more structured view of a JavaScript program is provided by the classes Token and Comment, which represent tokens and comments, respectively.
Tokens¶
The most important member predicates of class Token are as follows:
Token.getValue()
returns the source text of the token.Token.getIndex()
returns the index of the token within its enclosing script.Token.getNextToken()
andToken.getPreviousToken()
navigate between tokens.
The Token class has nine subclasses, each representing a particular kind of token:
- EOFToken: a marker token representing the end of a script
- NullLiteralToken, BooleanLiteralToken, NumericLiteralToken, StringLiteralToken and RegularExpressionToken: different kinds of literals
- IdentifierToken and KeywordToken: identifiers and keywords (including reserved words) respectively
- PunctuatorToken: operators and other punctuation symbols
As an example of a query operating entirely on the lexical level, consider the following query, which finds consecutive comma tokens arising from an omitted element in an array expression:
import javascript
class CommaToken extends PunctuatorToken {
CommaToken() {
getValue() = ","
}
}
from CommaToken comma
where comma.getNextToken() instanceof CommaToken
select comma, "Omitted array elements are bad style."
If the query returns no results, this pattern isn’t used in the projects that you analyzed.
You can use predicate Locatable.getFirstToken()
and Locatable.getLastToken()
to access the first and last token (if any) belonging to an element with a source location.
Comments¶
The class Comment and its subclasses represent the different kinds of comments that can occur in JavaScript programs:
- Comment: any comment
- LineComment: a single-line comment terminated by an end-of-line character
- SlashSlashComment: a plain JavaScript single-line comment starting with
//
- HtmlLineComment: a (non-standard) HTML comment
- HtmlCommentStart: an HTML comment starting with
<!--
- HtmlCommentEnd: an HTML comment ending with
-->
- HtmlCommentStart: an HTML comment starting with
- SlashSlashComment: a plain JavaScript single-line comment starting with
- BlockComment: a block comment potentially spanning multiple lines
- SlashStarComment: a plain JavaScript block comment surrounded with
/*...*/
- DocComment: a documentation block comment surrounded with
/**...*/
- SlashStarComment: a plain JavaScript block comment surrounded with
- LineComment: a single-line comment terminated by an end-of-line character
The most important member predicates are as follows:
Comment.getText()
returns the source text of the comment, not including delimiters.Comment.getLine(i)
returns thei
th line of text within the comment (0-based).Comment.getNumLines()
returns the number of lines in the comment.Comment.getNextToken()
returns the token immediately following a comment. Note that such a token always exists: if a comment appears at the end of a file, its following token is an EOFToken.
As an example of a query using only lexical information, consider the following query for finding HTML comments, which are not a standard ECMAScript feature and should be avoided:
import javascript
from HtmlLineComment c
select c, "Do not use HTML comments."
Syntactic level¶
The majority of classes in the JavaScript library is concerned with representing a JavaScript program as a collection of abstract syntax trees (ASTs).
The class ASTNode contains all entities representing nodes in the abstract syntax trees and defines generic tree traversal predicates:
ASTNode.getChild(i)
: returns thei
th child of this AST node.ASTNode.getAChild()
: returns any child of this AST node.ASTNode.getParent()
: returns the parent node of this AST node, if any.
Note
These predicates should only be used to perform generic AST traversal. To access children of specific AST node types, the specialized predicates introduced below should be used instead. In particular, queries should not rely on the numeric indices of child nodes relative to their parent nodes: these are considered an implementation detail that may change between versions of the library.
Top-levels¶
From a syntactic point of view, each JavaScript program is composed of one or more top-level code blocks (or top-levels for short), which are blocks of JavaScript code that do not belong to a larger code block. Top-levels are represented by the class TopLevel and its subclasses:
- TopLevel
- Script: a stand-alone file or HTML
<script>
element- ExternalScript: a stand-alone JavaScript file
- InlineScript: code embedded inline in an HTML
<script>
tag
- CodeInAttribute: a code block originating from an HTML attribute value
- EventHandlerCode: code from an event handler attribute such as
onload
- JavaScriptURL: code from a URL with the
javascript:
scheme
- EventHandlerCode: code from an event handler attribute such as
- Externs: a JavaScript file containing externs definitions
- Script: a stand-alone file or HTML
Every TopLevel class is contained in a File class, but a single File may contain more than one TopLevel. To go from a TopLevel tl
to its File, use tl.getFile()
; conversely, for a File f
, predicate f.getATopLevel()
returns a top-level contained in f
. For every AST node, predicate ASTNode.getTopLevel()
can be used to find the top-level it belongs to.
The TopLevel class additionally provides the following member predicates:
TopLevel.getNumberOfLines()
returns the total number of lines (including code, comments and whitespace) in the top-level.TopLevel.getNumberOfLinesOfCode()
returns the number of lines of code, that is, lines that contain at least one token.TopLevel.getNumberOfLinesOfComments()
returns the number of lines containing or belonging to a comment.TopLevel.isMinified()
determines whether the top-level contains minified code, using a heuristic based on the average number of statements per line.
Note
By default, GitHub code scanning filters out alerts in minified top-levels, since they are often hard to interpret. When you write your own queries in Visual Studio Code, this filtering is not done automatically, so you may want to explicitly add a condition of the form
and not e.getTopLevel().isMinified()
or similar to your query to exclude results in minified code.
Statements and expressions¶
The most important subclasses of ASTNode besides TopLevel are Stmt and Expr, which, together with their subclasses, represent statements and expressions, respectively. This section briefly discusses some of the more important classes and predicates. For a full reference of all the subclasses of Stmt and Expr and their API, see Stmt.qll and Expr.qll.
- Stmt: use
Stmt.getContainer()
to access the innermost function or top-level in which the statement is contained.- ControlStmt: a statement that controls the execution of other statements, that is, a conditional, loop,
try
orwith
statement; useControlStmt.getAControlledStmt()
to access the statements that it controls.- IfStmt: an
if
statement; useIfStmt.getCondition()
,IfStmt.getThen()
andIfStmt.getElse()
to access its condition expression, “then” branch and “else” branch, respectively. - LoopStmt: a loop; use
Loop.getBody()
andLoop.getTest()
to access its body and its test expression, respectively.- WhileStmt, DoWhileStmt: a “while” or “do-while” loop, respectively.
- ForStmt: a “for” statement; use
ForStmt.getInit()
andForStmt.getUpdate()
to access the init and update expressions, respectively. - EnhancedForLoop: a “for-in” or “for-of” loop; use
EnhancedForLoop.getIterator()
to access the loop iterator (which may be a expression or variable declaration), andEnhancedForLoop.getIterationDomain()
to access the expression being iterated over.
- WithStmt: a “with” statement; use
WithStmt.getExpr()
andWithStmt.getBody()
to access the controlling expression and the body of the with statement, respectively. - SwitchStmt: a switch statement; use
SwitchStmt.getExpr()
to access the expression on which the statement switches; useSwitchStmt.getCase(int)
andSwitchStmt.getACase()
to access individual switch cases; each case is modeled by an entity of class Case, whose member predicatesCase.getExpr()
andCase.getBodyStmt(int)
provide access to the expression checked by the switch case (which is undefined fordefault
), and its body. - TryStmt: a “try” statement; use
TryStmt.getBody()
,TryStmt.getCatchClause()
andTryStmt.getFinally
to access its body, “catch” clause and “finally” block, respectively.
- IfStmt: an
- BlockStmt: a block of statements; use
BlockStmt.getStmt(int)
to access the individual statements in the block. - ExprStmt: an expression statement; use
ExprStmt.getExpr()
to access the expression itself. - JumpStmt: a statement that disrupts structured control flow, that is, one of
break
,continue
,return
andthrow
; use predicateJumpStmt.getTarget()
to determine the target of the jump, which is either a statement or (forreturn
and uncaughtthrow
statements) the enclosing function.- BreakStmt: a “break” statement; use
BreakStmt.getLabel()
to access its (optional) target label. - ContinueStmt: a “continue” statement; use
ContinueStmt.getLabel()
to access its (optional) target label. - ReturnStmt: a “return” statement; use
ReturnStmt.getExpr()
to access its (optional) result expression. - ThrowStmt: a “throw” statement; use
ThrowStmt.getExpr()
to access its thrown expression.
- BreakStmt: a “break” statement; use
- FunctionDeclStmt: a function declaration statement; see below for available member predicates.
- ClassDeclStmt: a class declaration statement; see below for available member predicates.
- DeclStmt: a declaration statement containing one or more declarators which can be accessed by predicate
DeclStmt.getDeclarator(int)
.- VarDeclStmt, ConstDeclStmt, LetStmt: a
var
,const
orlet
declaration statement.
- VarDeclStmt, ConstDeclStmt, LetStmt: a
- ControlStmt: a statement that controls the execution of other statements, that is, a conditional, loop,
- Expr: use
Expr.getEnclosingStmt()
to obtain the innermost statement to which this expression belongs;Expr.isPure()
determines whether the expression is side-effect-free.- Identifier: an identifier; use
Identifier.getName()
to obtain its name. - Literal: a literal value; use
Literal.getValue()
to obtain a string representation of its value, andLiteral.getRawValue()
to obtain its raw source text (including surrounding quotes for string literals).- NullLiteral, BooleanLiteral, NumberLiteral, StringLiteral, RegExpLiteral: different kinds of literals.
- ThisExpr: a “this” expression.
- SuperExpr: a “super” expression.
- ArrayExpr: an array expression; use
ArrayExpr.getElement(i)
to obtain thei
th element expression, andArrayExpr.elementIsOmitted(i)
to check whether thei
th element is omitted. - ObjectExpr: an object expression; use
ObjectExpr.getProperty(i)
to obtain thei
th property in the object expression; properties are modeled by class Property, which is described in more detail below. - FunctionExpr: a function expression; see below for available member predicates.
- ArrowFunctionExpr: an ECMAScript 2015-style arrow function expression; see below for available member predicates.
- ClassExpr: a class expression; see below for available member predicates.
- ParExpr: a parenthesized expression; use
ParExpr.getExpression()
to obtain the operand expression; for any expression,Expr.stripParens()
can be used to recursively strip off any parentheses - SeqExpr: a sequence of two or more expressions connected by the comma operator; use
SeqExpr.getOperand(i)
to obtain thei
th sub-expression. - ConditionalExpr: a ternary conditional expression; member predicates
ConditionalExpr.getCondition()
,ConditionalExpr.getConsequent()
andConditionalExpr.getAlternate()
provide access to the condition expression, the “then” expression and the “else” expression, respectively. - InvokeExpr: a function call or a “new” expression; use
InvokeExpr.getCallee()
to obtain the expression specifying the function to be called, andInvokeExpr.getArgument(i)
to obtain thei
th argument expression.- CallExpr: a function call.
- NewExpr: a “new” expression.
- MethodCallExpr: a function call whose callee expression is a property access; use
MethodCallExpr.getReceiver
to access the receiver expression of the method call, andMethodCallExpr.getMethodName()
to get the method name (if it can be determined statically).
- PropAccess: a property access, that is, either a “dot” expression of the form
e.f
or an index expression of the forme[p]
; usePropAccess.getBase()
to obtain the base expression on which the property is accessed (e
in the example), andPropAccess.getPropertyName()
to determine the name of the accessed property; if the name cannot be statically determined,getPropertyName()
does not return any value. - UnaryExpr: a unary expression; use
UnaryExpr.getOperand()
to obtain the operand expression.- NegExpr (“-“), PlusExpr (“+”), LogNotExpr (“!”), BitNotExpr (“~”), TypeofExpr, VoidExpr, DeleteExpr, SpreadElement (”…”): various types of unary expressions.
- BinaryExpr: a binary expression; use
BinaryExpr.getLeftOperand()
andBinaryExpr.getRightOperand()
to access the operand expressions.- Comparison: any comparison expression.
- EqualityTest: any equality or inequality test.
- EqExpr (“==”), NEqExpr (“!=”): non-strict equality and inequality tests.
- StrictEqExpr (“===”), StrictNEqExpr (“!==”): strict equality and inequality tests.
- LTExpr (“<”), LEExpr (“<=”), GTExpr (“>”), GEExpr (“>=”): numeric comparisons.
- EqualityTest: any equality or inequality test.
- LShiftExpr (“<<”), RShiftExpr (“>>”), URShiftExpr (“>>>”): shift operators.
- AddExpr (“+”), SubExpr (“-“), MulExpr (“*”), DivExpr (“/”), ModExpr (“%”), ExpExpr (“**”): arithmetic operators.
- BitOrExpr (“|”), XOrExpr (“^”), BitAndExpr (”&”): bitwise operators.
- InExpr: an
in
test. - InstanceofExpr: an
instanceof
test. - LogAndExpr (”&&”), LogOrExpr (“||”): short-circuiting logical operators.
- Comparison: any comparison expression.
- Assignment: assignment expressions, either simple or compound; use
Assignment.getLhs()
andAssignment.getRhs()
to access the left- and right-hand side, respectively.- AssignExpr: a simple assignment expression.
- CompoundAssignExpr: a compound assignment expression.
- AssignAddExpr, AssignSubExpr, AssignMulExpr, AssignDivExpr, AssignModExpr, AssignLShiftExpr, AssignRShiftExpr, AssignURShiftExpr, AssignOrExpr, AssignXOrExpr, AssignAndExpr, AssignExpExpr: different kinds of compound assignment expressions.
- UpdateExpr: an increment or decrement expression; use
UpdateExpr.getOperand()
to obtain the operand expression.- PreIncExpr, PostIncExpr: an increment expression.
- PreDecExpr, PostDecExpr: a decrement expression.
- YieldExpr: a “yield” expression; use
YieldExpr.getOperand()
to access the (optional) operand expression; useYieldExpr.isDelegating()
to check whether this is a delegatingyield*
. - TemplateLiteral: an ECMAScript 2015 template literal;
TemplateLiteral.getElement(i)
returns thei
th element of the template, which may either be an interpolated expression or a constant template element. - TaggedTemplateExpr: an ECMAScript 2015 tagged template literal; use
TaggedTemplateExpr.getTag()
to access the tagging expression, andTaggedTemplateExpr.getTemplate()
to access the template literal being tagged. - TemplateElement: a constant template element; as for literals, use
TemplateElement.getValue()
to obtain the value of the element, andTemplateElement.getRawValue()
for its raw value - AwaitExpr: an “await” expression; use
AwaitExpr.getOperand()
to access the operand expression.
- Identifier: an identifier; use
Stmt and Expr share a common superclass ExprOrStmt which is useful for queries that should operate either on statements or on expressions, but not on any other AST nodes.
As an example of how to use expression AST nodes, here is a query that finds expressions of the form e + f >> g
; such expressions should be rewritten as (e + f) >> g
to clarify operator precedence:
import javascript
from ShiftExpr shift, AddExpr add
where add = shift.getAnOperand()
select add, "This expression should be bracketed to clarify precedence rules."
Functions¶
JavaScript provides several ways of defining functions: in ECMAScript 5, there are function declaration statements and function expressions, and ECMAScript 2015 adds arrow function expressions. These different syntactic forms are represented by the classes FunctionDeclStmt (a subclass of Stmt), FunctionExpr (a subclass of Expr) and ArrowFunctionExpr (also a subclass of Expr), respectively. All three are subclasses of Function, which provides common member predicates for accessing function parameters or the function body:
Function.getId()
returns the Identifier naming the function, which may not be defined for function expressions.Function.getParameter(i)
andFunction.getAParameter()
access thei
th parameter or any parameter, respectively; parameters are modeled by the class Parameter, which is a subclass of BindingPattern (see below).Function.getBody()
returns the body of the function, which is usually a Stmt, but may be an Expr for arrow function expressions and legacy expression closures.
As an example, here is a query that finds all expression closures:
import javascript
from FunctionExpr fe
where fe.getBody() instanceof Expr
select fe, "Use arrow expressions instead of expression closures."
As another example, this query finds functions that have two parameters that bind the same variable:
import javascript
from Function fun, Parameter p, Parameter q, int i, int j
where p = fun.getParameter(i) and
q = fun.getParameter(j) and
i < j and
p.getAVariable() = q.getAVariable()
select fun, "This function has two parameters that bind the same variable."
Classes¶
Classes can be defined either by class declaration statements, represented by the CodeQL class ClassDeclStmt (which is a subclass of Stmt), or by class expressions, represented by the CodeQL class ClassExpr (which is a subclass of Expr). Both of these classes are also subclasses of ClassDefinition, which provides common member predicates for accessing the name of a class, its superclass, and its body:
ClassDefinition.getIdentifier()
returns the Identifier naming the function, which may not be defined for class expressions.ClassDefinition.getSuperClass()
returns the Expr specifying the superclass, which may not be defined.ClassDefinition.getMember(n)
returns the definition of membern
of this class.ClassDefinition.getMethod(n)
restrictsClassDefinition.getMember(n)
to methods (as opposed to fields).ClassDefinition.getField(n)
restrictsClassDefinition.getMember(n)
to fields (as opposed to methods).ClassDefinition.getConstructor()
gets the constructor of this class, possibly a synthetic default constructor.
Note that class fields are not a standard language feature yet, so details of their representation may change.
Method definitions are represented by the class MethodDefinition, which (like its counterpart FieldDefinition for fields) is a subclass of MemberDefinition. That class provides the following important member predicates:
MemberDefinition.isStatic()
: holds if this is a static member.MemberDefinition.isComputed()
: holds if the name of this member is computed at runtime.MemberDefinition.getName()
: gets the name of this member if it can be determined statically.MemberDefinition.getInit()
: gets the initializer of this field; for methods, the initializer is a function expressions, for fields it may be an arbitrary expression, and may be undefined.
There are three classes for modeling special methods: ConstructorDefinition models constructors, while GetterMethodDefinition and SetterMethodDefinition model getter and setter methods, respectively.
Declarations and binding patterns¶
Variables are declared by declaration statements (class DeclStmt), which come in three flavors: var
statements (represented by class VarDeclStmt), const
statements (represented by class ConstDeclStmt), and let
statements (represented by class LetStmt). Every declaration statement has one or more declarators, represented by class VariableDeclarator.
Each declarator consists of a binding pattern, returned by predicate VariableDeclarator.getBindingPattern()
, and an optional initializing expression, returned by VariableDeclarator.getInit()
.
Often, the binding pattern is a simple identifier, as in var x = 42
. In ECMAScript 2015 and later, however, it can also be a more complex destructuring pattern, as in var [x, y] = arr
.
The various kinds of binding patterns are represented by class BindingPattern and its subclasses:
- VarRef: a simple identifier in an l-value position, for example the
x
invar x
or inx = 42
- Parameter: a function or catch clause parameter
- ArrayPattern: an array pattern, for example, the left-hand side of
[x, y] = arr
- ObjectPattern: an object pattern, for example, the left-hand side of
{x, y: z} = o
Here is an example of a query to find declaration statements that declare the same variable more than once, excluding results in minified code:
import javascript
from DeclStmt ds, VariableDeclarator d1, VariableDeclarator d2, Variable v, int i, int j
where d1 = ds.getDecl(i) and
d2 = ds.getDecl(j) and
i < j and
v = d1.getBindingPattern().getAVariable() and
v = d2.getBindingPattern().getAVariable() and
not ds.getTopLevel().isMinified()
select ds, "Variable " + v.getName() + " is declared both $@ and $@.", d1, "here", d2, "here"
This is not a common problem, so you may not find any results in your own projects.
Notice the use ofnot ... isMinified()
here and in the next few queries. This excludes any results found in minified code. If you deleteand not ds.getTopLevel().isMinified()
and re-run the query, two results in minified code in the meteor/meteor project are reported.
Properties¶
Properties in object literals are represented by class Property, which is also a subclass of ASTNode, but neither of Expr nor of Stmt.
Class Property has two subclasses ValueProperty and PropertyAccessor, which represent, respectively, normal value properties and getter/setter properties. Class PropertyAccessor, in turn, has two subclasses PropertyGetter and PropertySetter representing getters and setters, respectively.
The predicates Property.getName()
and Property.getInit()
provide access to the defined property’s name and its initial value. For PropertyAccessor and its subclasses, getInit()
is overloaded to return the getter/setter function.
As an example of a query involving properties, consider the following query that flags object expressions containing two identically named properties, excluding results in minified code:
import javascript
from ObjectExpr oe, Property p1, Property p2, int i, int j
where p1 = oe.getProperty(i) and
p2 = oe.getProperty(j) and
i < j and
p1.getName() = p2.getName() and
not oe.getTopLevel().isMinified()
select oe, "Property " + p1.getName() + " is defined both $@ and $@.", p1, "here", p2, "here"
Modules¶
The JavaScript library has support for working with ECMAScript 2015 modules, as well as legacy CommonJS modules (still commonly employed by Node.js code bases) and AMD-style modules. The classes ES2015Module, NodeModule, and AMDModule represent these three types of modules, and all three extend the common superclass Module.
The most important member predicates defined by Module are:
Module.getName()
: gets the name of the module, which is just the stem (that is, the basename without extension) of the enclosing file.Module.getAnImportedModule()
: gets another module that is imported (throughimport
orrequire
) by this module.Module.getAnExportedSymbol()
: gets the name of a symbol that this module exports.
Moreover, there is a class Import that models both ECMAScript 2015-style import
declarations and CommonJS/AMD-style require
calls; its member predicate Import.getImportedModule
provides access to the module the import refers to, if it can be determined statically.
Name binding¶
Name binding is modeled in the JavaScript libraries using four concepts: scopes, variables, variable declarations, and variable accesses, represented by the classes Scope, Variable, VarDecl and VarAccess, respectively.
Scopes¶
In ECMAScript 5, there are three kinds of scopes: the global scope (one per program), function scopes (one per function), and catch clause scopes (one per catch
clause). These three kinds of scopes are represented by the classes GlobalScope, FunctionScope and CatchScope. ECMAScript 2015 adds block scopes for let
-bound variables, which are also represented by class Scope, class expression scopes (ClassExprScope),
and module scopes (ModuleScope).
Class Scope provides the following API:
Scope.getScopeElement()
returns the AST node inducing this scope; undefined for GlobalScope.Scope.getOuterScope()
returns the lexically enclosing scope of this scope.Scope.getAnInnerScope()
returns a scope lexically nested inside this scope.Scope.getVariable(name)
,Scope.getAVariable()
return a variable declared (implicitly or explicitly) in this scope.
Variables¶
The Variable class models all variables in a JavaScript program, including global variables, local variables, and parameters (both of functions and catch
clauses), whether explicitly declared or not.
It is important not to confuse variables and their declarations: local variables may have more than one declaration, while global variables and the implicitly declared local arguments
variable need not have a declaration at all.
Variable declarations and accesses¶
Variables may be declared by variable declarators, by function declaration statements and expressions, by class declaration statements or expressions, or by parameters of functions and catch
clauses. While these declarations differ in their syntactic form, in each case there is an identifier naming the declared variable. We consider that identifier to be the declaration proper, and assign it the class VarDecl. Identifiers that reference a variable, on the other hand, are given the class VarAccess.
The most important predicates involving variables, their declarations, and their accesses are as follows:
Variable.getName()
,VarDecl.getName()
,VarAccess.getName()
return the name of the variable.Variable.getScope()
returns the scope to which the variable belongs.Variable.isGlobal()
,Variable.isLocal()
,Variable.isParameter()
determine whether the variable is a global variable, a local variable, or a parameter variable, respectively.Variable.getAnAccess()
maps a Variable to all VarAccesses that refer to it.Variable.getADeclaration()
maps a Variable to all VarDecls that declare it (of which there may be none, one, or more than one).Variable.isCaptured()
determines whether the variable is ever accessed in a scope that is lexically nested within the scope where it is declared.
As an example, consider the following query which finds distinct function declarations that declare the same variable, that is, two conflicting function declarations within the same scope (again excluding minified code):
import javascript
from FunctionDeclStmt f, FunctionDeclStmt g
where f != g and f.getVariable() = g.getVariable() and
not f.getTopLevel().isMinified() and
not g.getTopLevel().isMinified()
select f, g
Some projects declare conflicting functions of the same name and rely on platform-specific behavior to disambiguate the two declarations.
Control flow¶
A different program representation in terms of intraprocedural control flow graphs (CFGs) is provided by the classes in library CFG.qll.
Class ControlFlowNode represents a single node in the control flow graph, which is either an expression, a statement, or a synthetic control flow node. Note that Expr and Stmt do not inherit from ControlFlowNode at the CodeQL level, although their entity types are compatible, so you can explicitly cast from one to the other if you need to map between the AST-based and the CFG-based program representations.
There are two kinds of synthetic control flow nodes: entry nodes (class ControlFlowEntryNode), which represent the beginning of a top-level or function, and exit nodes (class ControlFlowExitNode), which represent their end. They do not correspond to any AST nodes, but simply serve as the unique entry point and exit point of a control flow graph. Entry and exit nodes can be accessed through the predicates StmtContainer.getEntry()
and StmtContainer.getExit()
.
Most, but not all, top-levels and functions have another distinguished CFG node, the start node. This is the CFG node at which execution begins. Unlike the entry node, which is a synthetic construct, the start node corresponds to an actual program element: for top-levels, it is the first CFG node of the first statement; for functions, it is the CFG node corresponding to their first parameter or, if there are no parameters, the first CFG node of the body. Empty top-levels do not have a start node.
For most purposes, using start nodes is preferable to using entry nodes.
The structure of the control flow graph is reflected in the member predicates of ControlFlowNode:
ControlFlowNode.getASuccessor()
returns a ControlFlowNode that is a successor of this ControlFlowNode in the control flow graph.ControlFlowNode.getAPredecessor()
is the inverse ofgetASuccessor()
.ControlFlowNode.isBranch()
determines whether this node has more than one successor.ControlFlowNode.isJoin()
determines whether this node has more than one predecessor.ControlFlowNode.isStart()
determines whether this node is a start node.
Many control-flow-based analyses are phrased in terms of basic blocks rather than single control flow nodes, where a basic block is a maximal sequence of control flow nodes without branches or joins. The class BasicBlock from BasicBlocks.qll represents all such basic blocks. Similar to ControlFlowNode, it provides member predicates getASuccessor()
and getAPredecessor()
to navigate the control flow graph at the level of basic blocks, and member predicates getANode()
, getNode(int)
, getFirstNode()
and getLastNode()
to access individual control flow nodes within a basic block. The predicate
Function.getEntryBB()
returns the entry basic block in a function, that is, the basic block containing the function’s entry node. Similarly, Function.getStartBB()
provides access to the start basic block, which contains the function’s start node. As for CFG nodes, getStartBB()
should normally be preferred over getEntryBB()
.
As an example of an analysis using basic blocks, BasicBlock.isLiveAtEntry(v, u)
determines whether variable v
is live at the entry of the given basic block, and if so binds u
to a use of v
that refers to its value at the entry. We can use it to find global variables that are used in a function where they are not live (that is, every read of the variable is preceded by a write), suggesting that the variable was meant to be declared as a local variable instead:
import javascript
from Function f, GlobalVariable gv
where gv.getAnAccess().getEnclosingFunction() = f and
not f.getStartBB().isLiveAtEntry(gv, _)
select f, "This function uses " + gv + " like a local variable."
Many projects have some variables which look as if they were intended to be local.
Data flow¶
Definitions and uses¶
Library DefUse.qll provides classes and predicates to determine def-use relationships between definitions and uses of variables.
Classes VarDef and VarUse contain all expressions that define and use a variable, respectively. For the former, you can use predicate VarDef.getAVariable()
to find out which variables are defined by a given variable definition (recall that destructuring assignments in ECMAScript 2015 define several variables at the same time). Similarly, predicate VarUse.getVariable()
returns the (single) variable being accessed by a variable use.
The def-use information itself is provided by predicate VarUse.getADef()
, that connects a use of a variable to a definition of the same variable, where the definition may reach the use.
As an example, the following query finds definitions of local variables that are not used anywhere; that is, the variable is either not referenced at all after the definition, or its value is overwritten:
import javascript
from VarDef def, LocalVariable v
where v = def.getAVariable() and
not exists (VarUse use | def = use.getADef())
select def, "Dead store of local variable."
SSA¶
A more fine-grained representation of a program’s data flow based on Static Simple Assignment Form (SSA) is provided by the library semmle.javascript.SSA
.
In SSA form, each use of a local variable has exactly one (SSA) definition that reaches it. SSA definitions are represented by class SsaDefinition. They are not AST nodes, since not every SSA definition corresponds to an explicit element in the source code.
Altogether, there are five kinds of SSA definitions:
- Explicit definitions (SsaExplicitDefinition): these simply wrap a VarDef, that is, a definition like
x = 1
appearing explicitly in the source code. - Implicit initializations (SsaImplicitInit): these represent the implicit initialization of local variables with
undefined
at the beginning of their scope. - Phi nodes (SsaPhiNode): these are pseudo-definitions that merge two or more SSA definitions where necessary; see the Wikipedia page linked to above for an explanation.
- Variable captures (SsaVariableCapture): these are pseudo-definitions appearing at places in the code where the value of a captured variable may change without there being an explicit assignment, for example due to a function call.
- Refinement nodes (SsaRefinementNode): these are pseudo-definitions appearing at places in the code where something becomes known about a variable; for example, a conditional
if (x === null)
induces a refinement node at the beginning of its “then” branch recording the fact thatx
is known to benull
there. (In the literature, these are sometimes known as “pi nodes.”)
Data flow nodes¶
Moving beyond just variable definitions and uses, library semmle.javascript.dataflow.DataFlow
provides a representation of the program as a data flow graph. Its nodes are values of class DataFlow::Node, which has two subclasses ValueNode
and SsaDefinitionNode
. Nodes of the former kind wrap an expression or a statement that is considered to produce a value (specifically, a function or class declaration statement, or a TypeScript namespace or enum declaration). Nodes of the latter kind wrap SSA definitions.
You can use the predicate DataFlow::valueNode
to convert an expression, function or class into its corresponding ValueNode
, and similarly DataFlow::ssaDefinitionNode
to map an SSA definition to its corresponding SsaDefinitionNode
.
There is also an auxiliary predicate DataFlow::parameterNode
that maps a parameter to its corresponding data flow node. (This is really just a convenience wrapper around DataFlow::ssaDefinitionNode
, since parameters are also considered to be SSA definitions.)
Going in the other direction, there is a predicate ValueNode.getAstNode()
for mapping from ValueNode
s to ASTNode
s, and SsaDefinitionNode.getSsaVariable()
for mapping from SsaDefinitionNode
s to SsaVariable
s. There is also a utility predicate Node.asExpr()
that gets the underlying expression for a ValueNode
, and is undefined for all nodes that do not correspond to an expression. (Note in particular that this predicate is not defined for ValueNode
s wrapping function or class declaration statements!)
You can use the predicate DataFlow::Node.getAPredecessor()
to find other data flow nodes from which values may flow into this node, and getASuccessor
for the other direction.
For example, here is a query that finds all invocations of a method called send
on a value that comes from a parameter named res
, indicating that it is perhaps sending an HTTP response:
import javascript
from SimpleParameter res, DataFlow::Node resNode, MethodCallExpr send
where res.getName() = "res" and
resNode = DataFlow::parameterNode(res) and
resNode.getASuccessor+() = DataFlow::valueNode(send.getReceiver()) and
send.getMethodName() = "send"
select send
Note that the data flow modeling in this library is intraprocedural, that is, flow across function calls and returns is not modeled. Likewise, flow through object properties and global variables is not modeled.
Type inference¶
The library semmle.javascript.dataflow.TypeInference
implements a simple type inference for JavaScript based on intraprocedural, heap-insensitive flow analysis. Basically, the inference algorithm approximates the possible concrete runtime values of variables and expressions as sets of abstract values (represented by the class AbstractValue), each of which stands for a set of concrete values.
For example, there is an abstract value representing all non-zero numbers, and another representing all non-empty strings except for those that can be converted to a number. Both of these abstract values are fairly coarse approximations that represent very large sets of concrete values.
Other abstract values are more precise, to the point where they represent single concrete values: for example, there is an abstract value representing the concrete null
value, and another representing the number zero.
There is a special group of abstract values called indefinite abstract values that represent all concrete values. The analysis uses these to handle expressions for which it cannot infer a more precise value, such as function parameters (as mentioned above, the analysis is intraprocedural and hence does not model argument passing) or property reads (the analysis does not model property values either).
Each indefinite abstract value is associated with a string value describing the cause of imprecision. In the above examples, the indefinite value for the parameter would have cause "call"
, while the indefinite value for the property would have cause "heap"
.
To check whether an abstract value is indefinite, you can use the isIndefinite
member predicate. Its single argument describes the cause of imprecision.
Each abstract value has one or more associated types (CodeQL class InferredType corresponding roughly to the type tags computed by the typeof
operator. The types are null
, undefined
, boolean
, number
, string
, function
, class
, date
and object
.
To access the results of the type inference, use class DataFlow::AnalyzedNode: any DataFlow::Node can be cast to this class, and additionally there is a convenience predicate Expr::analyze
that maps expressions directly to their corresponding AnalyzedNode
s.
Once you have an AnalyzedNode
, you can use predicate AnalyzedNode.getAValue()
to access the abstract values inferred for it, and getAType()
to get the inferred types.
For example, here is a query that looks for null
checks on expressions that cannot, in fact, be null:
import javascript
from StrictEqualityTest eq, DataFlow::AnalyzedNode nd, NullLiteral null
where eq.hasOperands(nd.asExpr(), null) and
not nd.getAValue().isIndefinite(_) and
not nd.getAValue() instanceof AbstractNull
select eq, "Spurious null check."
To paraphrase, the query looks for equality tests eq
where one operand is a null
literal and the other some expression that we convert to an AnalyzedNode
. If the type inference results for that node are precise (that is, none of the inferred values is indefinite) and (the abstract representation of) null
is not among them, we flag eq
.
You can add custom type inference rules by defining new subclasses of DataFlow::AnalyzedNode
and overriding getAValue
. You can also introduce new abstract values by extending the abstract class CustomAbstractValueTag
, which is a subclass of string
: each string belonging to that class induces a corresponding abstract value of type CustomAbstractValue
. You can use the predicate CustomAbstractValue.getTag()
to map from the abstract value to its tag. By implementing the abstract predicates of class CustomAbstractValueTag
you can define the semantics of your custom abstract values, such as what primitive value they coerce to and what type they have.
Call graph¶
The JavaScript library implements a simple call graph construction algorithm to statically approximate the possible call targets of function calls and new
expressions. Due to the dynamically typed nature of JavaScript and its support for higher-order functions and reflective language features, building static call graphs is quite difficult. Simple call graph algorithms tend to be incomplete, that is, they often fail to resolve all possible call targets. More sophisticated algorithms can suffer from the opposite problem of imprecision, that is, they may infer many spurious call targets.
The call graph is represented by the member predicate getACallee()
of class DataFlow::InvokeNode, which computes possible callees of the given invocation, that is, functions that may at runtime be invoked by this expression.
Furthermore, there are three member predicates that indicate the quality of the callee information for this invocation:
DataFlow::InvokeNode.isImprecise()
: holds for invocations where the call graph builder might infer spurious call targets.DataFlow::InvokeNode.isIncomplete()
: holds for invocations where the call graph builder might fail to infer possible call targets.DataFlow::InvokeNode.isUncertain()
: holds if eitherisImprecise()
orisIncomplete()
holds.
As an example of a call-graph-based query, here is a query to find invocations for which the call graph builder could not find any callees, despite the analysis being complete for this invocation:
import javascript
from DataFlow::InvokeNode invk
where not invk.isIncomplete() and
not exists(invk.getACallee())
select invk, "Unable to find a callee for this invocation."
Inter-procedural data flow¶
The data flow graph-based analyses described so far are all intraprocedural: they do not take flow from function arguments to parameters or from a return
to the function’s caller into account. The data flow library also provides a framework for constructing custom inter-procedural analyses.
We distinguish here between data flow proper, and taint tracking: the latter not only considers value-preserving flow (such as from variable definitions to uses), but also cases where one value influences (“taints”) another without determining it entirely. For example, in the assignment s2 = s1.substring(i)
, the value of s1
influences the value of s2
, because s2
is assigned a substring of s1
. In general, s2
will not be assigned s1
itself, so there is no data flow from s1
to s2
, but s1
still taints s2
.
It is a common pattern that we wish to specify data flow or taint analysis in terms of its sources (where flow starts), sinks (where it should be tracked), and barriers or sanitizers (where flow is interrupted). Sanitizers they are very common in security analyses: for example, an analysis that tracks the flow of untrusted user input into, say, a SQL query has to keep track of code that validates the input, thereby making it safe to use. Such a validation step is an example of a sanitizer.
The classes DataFlow::Configuration
and TaintTracking::Configuration
allow specifying a data flow or taint analysis, respectively, by overriding the following predicates:
isSource(DataFlow::Node nd)
selects all nodesnd
from where flow tracking starts.isSink(DataFlow::Node nd)
selects all nodesnd
to which the flow is tracked.isBarrier(DataFlow::Node nd)
selects all nodesnd
that act as a barrier for data flow;isSanitizer
is the corresponding predicate for taint tracking configurations.isBarrierEdge(DataFlow::Node src, DataFlow::Node trg)
is a variant ofisBarrier(nd)
that allows specifying barrier edges in addition to barrier nodes; again,isSanitizerEdge
is the corresponding predicate for taint tracking;isAdditionalFlowStep(DataFlow::Node src, DataFlow::Node trg)
allows specifying custom additional flow steps for this analysis;isAdditionalTaintStep
is the corresponding predicate for taint tracking configurations.
Since for technical reasons both Configuration
classes are subtypes of string
, you have to choose a unique name for each flow configuration and equate this
with it in the characteristic predicate (as in the example below).
The predicate Configuration.hasFlow
performs the actual flow tracking, starting at a source and looking for flow to a sink that does not pass through a barrier node or edge.
For example, suppose that we are developing an analysis to find hard-coded passwords. We might write a simple query that looks for string constants flowing into variables named "password"
.
import javascript
class PasswordTracker extends DataFlow::Configuration {
PasswordTracker() {
// unique identifier for this configuration
this = "PasswordTracker"
}
override predicate isSource(DataFlow::Node nd) {
nd.asExpr() instanceof StringLiteral
}
override predicate isSink(DataFlow::Node nd) {
passwordVarAssign(_, nd)
}
predicate passwordVarAssign(Variable v, DataFlow::Node nd) {
v.getAnAssignedExpr() = nd.asExpr() and
v.getName().toLowerCase() = "password"
}
}
Now we can rephrase our query to use Configuration.hasFlow
:
from PasswordTracker pt, DataFlow::Node source, DataFlow::Node sink, Variable v
where pt.hasFlow(source, sink) and pt.passwordVarAssign(v, sink)
select sink, "Password variable " + v + " is assigned a constant string."
Syntax errors¶
JavaScript code that contains syntax errors cannot usually be analyzed. For such code, the lexical and syntactic representations are not available, and hence no name binding information, call graph or control and data flow. All that is available in this case is a value of class JSParseError representing the syntax error. It provides information about the syntax error location (JSParseError is a subclass of Locatable) and the error message through predicate JSParseError.getMessage
.
Note that for some very simple syntax errors the parser can recover and continue parsing. If this happens, lexical and syntactic information is available in addition to the JSParseError values representing the (recoverable) syntax errors encountered during parsing.
Frameworks¶
AngularJS¶
The semmle.javascript.frameworks.AngularJS
library provides support for working with AngularJS (Angular 1.x) code. Its most important classes are:
- AngularJS::AngularModule: an Angular module
- AngularJS::DirectiveDefinition, AngularJS::FactoryRecipeDefinition, AngularJS::FilterDefinition, AngularJS::ControllerDefinition: a definition of a directive, service, filter or controller, respectively
- AngularJS::InjectableFunction: a function that is subject to dependency injection
HTTP framework libraries¶
The library semmle.javacript.frameworks.HTTP
provides classes modeling common concepts from various HTTP frameworks.
Currently supported frameworks are Express, the standard Node.js http
and https
modules, Connect, Koa, Hapi and Restify.
The most important classes include (all in module HTTP
):
ServerDefinition
: an expression that creates a new HTTP server.RouteHandler
: a callback for handling an HTTP request.RequestExpr
: an expression that may contain an HTTP request object.ResponseExpr
: an expression that may contain an HTTP response object.HeaderDefinition
: an expression that sets one or more HTTP response headers.CookieDefinition
: an expression that sets a cookie in an HTTP response.RequestInputAccess
: an expression that accesses user-controlled request data.
For each framework library, there is a corresponding CodeQL library (for example semmle.javacript.frameworks.Express
) that instantiates the above classes for that framework and adds framework-specific classes.
Node.js¶
The semmle.javascript.NodeJS
library provides support for working with Node.js modules through the following classes:
- NodeModule: a top-level that defines a Node.js module; see the section on Modules for more information.
- Require: a call to the special
require
function that imports a module.
As an example of the use of these classes, here is a query that counts for every module how many other modules it imports:
import javascript
from NodeModule m
select m, count(m.getAnImportedModule())
When you analyze a project, for each module you can see how many other modules it imports.
NPM¶
The semmle.javascript.NPM
library provides support for working with NPM packages through the following classes:
- PackageJSON: a
package.json
file describing an NPM package; various getter predicates are available for accessing detailed information about the package, which are described in the online API documentation. - BugTrackerInfo, ContributorInfo, RepositoryInfo: these classes model parts of the
package.json
file providing information on bug tracking systems, contributors and repositories. - PackageDependencies: models the dependencies of an NPM package; the predicate
PackageDependencies.getADependency(pkg, v)
bindspkg
to the name andv
to the version of a package required by apackage.json
file. - NPMPackage: a subclass of Folder that models an NPM package; important member predicates include:
NPMPackage.getPackageName()
returns the name of this package.NPMPackage.getPackageJSON()
returns thepackage.json
file for this package.NPMPackage.getNodeModulesFolder()
returns thenode_modules
folder for this package.NPMPackage.getAModule()
returns a Node.js module belonging to this package (not including modules in thenode_modules
folder).
As an example of the use of these classes, here is a query that identifies unused dependencies, that is, module dependencies that are listed in the package.json
file, but which are not imported by any require
call:
import javascript
from NPMPackage pkg, PackageDependencies deps, string name
where deps = pkg.getPackageJSON().getDependencies() and
deps.getADependency(name, _) and
not exists (Require req | req.getTopLevel() = pkg.getAModule() | name = req.getImportedPath().getValue())
select deps, "Unused dependency '" + name + "'."
React¶
The semmle.javascript.frameworks.React
library provides support for working with React code through the ReactComponent class, which models a React component defined either in the functional style or the class-based style (both ECMAScript 2015 classes and old-style React.createClass
classes are supported).
Databases¶
The class SQL::SqlString
represents an expression that is interpreted as a SQL command. Currently, we model SQL commands issued through the following npm packages:
mysql, pg, pg-pool, sqlite3, mssql and sequelize.
Similarly, the class NoSQL::Query
represents an expression that is interpreted as a NoSQL query by the mongodb
or mongoose
package.
Finally, the class DatabaseAccess
contains all data flow nodes that perform a database access using any of the packages above.
For example, here is a query to find SQL queries that use string concatenation (instead of a templating-based solution, which is usually safer):
import javascript
from SQL::SqlString ss
where ss instanceof AddExpr
select ss, "Use templating instead of string concatenation."
Miscellaneous¶
Externs¶
The semmle.javascript.Externs
library provides support for working with externs through the following classes:
- ExternalDecl: common superclass modeling all different kinds of externs declarations; it defines two member predicates:
ExternalDecl.getQualifiedName()
returns the fully qualified name of the declared entity.ExternalDecl.getName()
returns the unqualified name of the declared entity.
- ExternalTypedef: a subclass of ExternalDecl representing type declarations; unlike other externs declarations, such declarations do not declare a function or object that is present at runtime, but simply introduce an alias for a type.
- ExternalVarDecl: a subclass of ExternalDecl representing a variable or function declaration; it defines two member predicates:
Variables and functions declared in an externs file are either globals (represented by class ExternalGlobalDecl), or members (represented by class ExternalMemberDecl).
Members are further subdivided into static members (class ExternalStaticMemberDecl) and instance members (class ExternalInstanceMemberDecl).
For more details on these and other classes representing externs, see the API documentation.
HTML¶
The semmle.javascript.HTML
library provides support for working with HTML documents. They are represented as a tree of HTML::Element
nodes, each of which may have zero or more attributes represented by class HTML::Attribute
.
Similar to the abstract syntax tree representation, HTML::Element
has member predicates getChild(i)
and getParent()
to navigate from an element to its i
th child element and its parent element, respectively. Use predicate HTML::Element.getAttribute(i)
to get the i
th attribute of the element, and HTML::Element.getAttributeByName(n)
to get the attribute with name n
.
For HTML::Attribute
, predicates getName()
and getValue()
provide access to the attribute’s name and value, respectively.
Both HTML::Element
and HTML::Attribute
have a predicate getRoot()
that gets the root HTML::Element
of the document to which they belong.
JSDoc¶
The semmle.javascript.JSDoc
library provides support for working with JSDoc comments. Documentation comments are parsed into an abstract syntax tree representation closely following the format employed by the Doctrine JSDoc parser.
A JSDoc comment as a whole is represented by an entity of class JSDoc, while individual tags are represented by class JSDocTag. Important member predicates of these two classes include:
JSDoc.getDescription()
returns the descriptive header of the JSDoc comment, if any.JSDoc.getComment()
maps the JSDoc entity to its underlying Comment entity.JSDocTag.getATag()
returns a tag in this JSDoc comment.JSDocTag.getTitle()
returns the title of his tag; for instance, an@param
tag has title"param"
.JSDocTag.getName()
returns the name of the parameter or variable documented by this tag.JSDocTag.getType()
returns the type of the parameter or variable documented by this tag.JSDocTag.getDescription()
returns the description associated with this tag.
Types in JSDoc comments are represented by the class JSDocTypeExpr and its subclasses, which again represent type expressions as abstract syntax trees. Examples of type expressions are JSDocAnyTypeExpr, representing the “any” type *
, or JSDocNullTypeExpr, representing the null type.
As an example, here is a query that finds @param
tags that do not specify the name of the documented parameter:
import javascript
from JSDocTag t
where t.getTitle() = "param" and
not exists(t.getName())
select t, "@param tag is missing name."
For full details on these and other classes representing JSDoc comments and type expressions, see the API documentation.
JSX¶
The semmle.javascript.JSX
library provides support for working with JSX code.
Similar to the representation of HTML documents, JSX fragments are modeled as a tree of JSXElements, each of which may have zero or more JSXAttributes.
However, unlike HTML, JSX is interleaved with JavaScript, hence JSXElement is a subclass of Expr. Like HTML::Element
, it has predicates getAttribute(i)
and getAttributeByName(n)
to look up attributes of a JSX element. Its body elements can be accessed by predicate getABodyElement()
; note that the results of this predicate are arbitrary expressions, which may either be further JSXElements, or other expressions that are interpolated into the body of the outer element.
JSXAttribute, again not unlike HTML::Attribute
, has predicates getName()
and getValue()
to access the attribute name and value.
JSON¶
The semmle.javascript.JSON
library provides support for working with JSON files that were processed by the JavaScript extractor when building the CodeQL database.
JSON files are modeled as trees of JSON values. Each JSON value is represented by an entity of class JSONValue, which provides the following member predicates:
JSONValue.getParent()
returns the JSON object or array in which this value occurs.JSONValue.getChild(i)
returns thei
th child of this JSON object or array.
Note that JSONValue is a subclass of Locatable, so the usual member predicates of Locatable can be used to determine the file in which a JSON value appears, and its location within that file.
Class JSONValue has the following subclasses:
- JSONPrimitiveValue: a JSON-encoded primitive value; use
JSONPrimitiveValue.getValue()
to obtain a string representation of the value.- JSONNull, JSONBoolean, JSONNumber, JSONString: subclasses of JSONPrimitiveValue representing the various kinds of primitive values.
- JSONArray: a JSON-encoded array; use
JSONArray.getElementValue(i)
to access thei
th element of the array. - JSONObject: a JSON-encoded object; use
JSONObject.getValue(n)
to access the value of propertyn
of the object.
Regular expressions¶
The semmle.javascript.Regexp
library provides support for working with regular expression literals. The syntactic structure of regular expression literals is represented as an abstract syntax tree of regular expression terms, modeled by the class RegExpTerm. Similar to ASTNode, class RegExpTerm provides member predicates getParent()
and getChild(i)
to navigate the structure of the syntax tree.
Various subclasses of RegExpTerm model different kinds of regular expression constructs and operators; see the API documentation for details.
YAML¶
The semmle.javascript.YAML
library provides support for working with YAML files that were processed by the JavaScript extractor when building the CodeQL database.
YAML files are modeled as trees of YAML nodes. Each YAML node is represented by an entity of class YAMLNode, which provides, among others, the following member predicates:
YAMLNode.getParentNode()
returns the YAML collection in which this node is syntactically nested.YAMLNode.getChildNode(i)
returns thei
th child node of this node,YAMLNode.getAChildNode()
returns any child node of this node.YAMLNode.getTag()
returns the tag of this YAML node.YAMLNode.getAnchor()
returns the anchor associated with this YAML node, if any.YAMLNode.eval()
returns the YAMLValue this YAML node evaluates to after resolving aliases and includes.
The various kinds of scalar values available in YAML are represented by classes YAMLInteger, YAMLFloat, YAMLTimestamp, YAMLBool, YAMLNull and YAMLString. Their common superclass is YAMLScalar, which has a member predicate getValue()
to obtain the value of a scalar as a
string.
YAMLMapping and YAMLSequence represent mappings and sequences, respectively, and are subclasses of YAMLCollection.
Alias nodes are represented by class YAMLAliasNode, while YAMLMergeKey and YAMLInclude represent merge keys and !include
directives, respectively.
Predicate YAMLMapping.maps(key, value)
models the key-value relation represented by a mapping, taking merge keys into account.