CodeQL library for JavaScript¶

When you’re analyzing a JavaScript program, you can make use of the large collection of classes in the CodeQL library for JavaScript.

Overview¶

There is an extensive CodeQL library for analyzing JavaScript code. The classes in this library present the data from a CodeQL database in an object-oriented form and provide abstractions and predicates to help you with common analysis tasks.

The library is implemented as a set of QL modules, that is, files with the extension .qll. The module javascript.qll imports most other standard library modules, so you can include the complete library by beginning your query with:

import javascript

The rest of this tutorial briefly summarizes the most important classes and predicates provided by this library, including references to the detailed API documentation where applicable.

Introducing the library¶

The CodeQL library for JavaScript presents information about JavaScript source code at different levels:

Textual — classes that represent source code as unstructured text files
Lexical — classes that represent source code as a series of tokens and comments
Syntactic — classes that represent source code as an abstract syntax tree
Name binding — classes that represent scopes and variables
Control flow — classes that represent the flow of control during execution
Data flow — classes that you can use to reason about data flow in JavaScript source code
Type inference — classes that you can use to approximate types for JavaScript expressions and variables
Call graph — classes that represent the caller-callee relationship between functions
Inter-procedural data flow — classes that you can use to define inter-procedural data flow and taint tracking analyses
Frameworks — classes that represent source code entities that have a special meaning to JavaScript tools and frameworks

Note that representations above the textual level (for example the lexical representation or the flow graphs) are only available for JavaScript code that does not contain fatal syntax errors. For code with such errors, the only information available is at the textual level, as well as information about the errors themselves.

Additionally, there is library support for working with HTML documents, JSON, and YAML data, JSDoc comments, and regular expressions.

Textual level¶

At its most basic level, a JavaScript code base can simply be viewed as a collection of files organized into folders, where each file is composed of zero or more lines of text.

Note that the textual content of a program is not included in the CodeQL database unless you specifically request it during extraction.

Files and folders¶

In the CodeQL libraries, files are represented as entities of class File, and folders as entities of class Folder, both of which are subclasses of class Container.

Class Container provides the following member predicates:

Container.getParentContainer() returns the parent folder of the file or folder.
Container.getAFile() returns a file within the folder.
Container.getAFolder() returns a folder nested within the folder.

Note that while getAFile and getAFolder are declared on class Container, they currently only have results for Folders.

Both files and folders have paths, which can be accessed by the predicate Container.getAbsolutePath(). For example, if f represents a file with the path /home/user/project/src/index.js, then f.getAbsolutePath() evaluates to the string "/home/user/project/src/index.js", while f.getParentContainer().getAbsolutePath() returns "/home/user/project/src".

These paths are absolute file system paths. If you want to obtain the path of a file relative to the source location in the CodeQL database, use Container.getRelativePath() instead. Note, however, that a database may contain files that are not located underneath the source location; for such files, getRelativePath() will not return anything.

The following member predicates of class Container provide more information about the name of a file or folder:

Container.getBaseName() returns the base name of a file or folder, not including its parent folder, but including its extension. In the above example, f.getBaseName() would return the string "index.js".
Container.getStem() is similar to Container.getBaseName(), but it does not include the file extension; so f.getStem() returns "index".
Container.getExtension() returns the file extension, not including the dot; so f.getExtension() returns "js".

For example, the following query computes, for each folder, the number of JavaScript files (that is, files with extension js) contained in the folder:

import javascript

from Folder d
select d.getRelativePath(), count(File f | f = d.getAFile() and f.getExtension() = "js")

When you run the query on most projects, the results include folders that contain files with a js extension and folders that don’t.

Locations¶

Most entities in a CodeQL database have an associated source location. Locations are identified by five pieces of information: a file, a start line, a start column, an end line, and an end column. Line and column counts are 1-based (so the first character of a file is at line 1, column 1), and the end position is inclusive.

All entities associated with a source location belong to the class Locatable. The location itself is modeled by the class Location and can be accessed through the member predicate Locatable.getLocation(). The Location class provides the following member predicates:

Location.getFile(), Location.getStartLine(), Location.getStartColumn(), Location.getEndLine(), Location.getEndColumn() return detailed information about the location.
Location.getNumLines() returns the number of (whole or partial) lines covered by the location.
Location.startsBefore(Location) and Location.endsAfter(Location) determine whether one location starts before or ends after another location.
Location.contains(Location) indicates whether one location completely contains another location; l1.contains(l2) holds if, and only if, l1.startsBefore(l2) and l1.endsAfter(l2).

Lines¶

Lines of text in files are represented by the class Line. This class offers the following member predicates:

Line.getText() returns the text of the line, excluding any terminating newline characters.
Line.getTerminator() returns the terminator character(s) of the line. The last line in a file may not have any terminator characters, in which case this predicate does not return anything; otherwise it returns either the two-character string "\r\n" (carriage-return followed by newline), or one of the one-character strings "\n" (newline), "\r" (carriage-return), "\u2028" (Unicode character LINE SEPARATOR), "\u2029" (Unicode character PARAGRAPH SEPARATOR).

Note that, as mentioned above, the textual representation of the program is not included in the CodeQL database by default.

Lexical level¶

A slightly more structured view of a JavaScript program is provided by the classes Token and Comment, which represent tokens and comments, respectively.

Tokens¶

The most important member predicates of class Token are as follows:

Token.getValue() returns the source text of the token.
Token.getIndex() returns the index of the token within its enclosing script.
Token.getNextToken() and Token.getPreviousToken() navigate between tokens.

The Token class has nine subclasses, each representing a particular kind of token:

EOFToken: a marker token representing the end of a script
NullLiteralToken, BooleanLiteralToken, NumericLiteralToken, StringLiteralToken and RegularExpressionToken: different kinds of literals
IdentifierToken and KeywordToken: identifiers and keywords (including reserved words) respectively
PunctuatorToken: operators and other punctuation symbols

As an example of a query operating entirely on the lexical level, consider the following query, which finds consecutive comma tokens arising from an omitted element in an array expression:

import javascript

class CommaToken extends PunctuatorToken {
    CommaToken() {
        getValue() = ","
    }
}

from CommaToken comma
where comma.getNextToken() instanceof CommaToken
select comma, "Omitted array elements are bad style."

If the query returns no results, this pattern isn’t used in the projects that you analyzed.

You can use predicate Locatable.getFirstToken() and Locatable.getLastToken() to access the first and last token (if any) belonging to an element with a source location.

Comments¶

The class Comment and its subclasses represent the different kinds of comments that can occur in JavaScript programs:

Comment: any comment
- LineComment: a single-line comment terminated by an end-of-line character
  - SlashSlashComment: a plain JavaScript single-line comment starting with //
  - HtmlLineComment: a (non-standard) HTML comment
    - HtmlCommentStart: an HTML comment starting with 
- BlockComment: a block comment potentially spanning multiple lines
  - SlashStarComment: a plain JavaScript block comment surrounded with /*...*/
  - DocComment: a documentation block comment surrounded with /**...*/

The most important member predicates are as follows:

Comment.getText() returns the source text of the comment, not including delimiters.
Comment.getLine(i) returns the ith line of text within the comment (0-based).
Comment.getNumLines() returns the number of lines in the comment.
Comment.getNextToken() returns the token immediately following a comment. Note that such a token always exists: if a comment appears at the end of a file, its following token is an EOFToken.

As an example of a query using only lexical information, consider the following query for finding HTML comments, which are not a standard ECMAScript feature and should be avoided:

import javascript

from HtmlLineComment c
select c, "Do not use HTML comments."

Syntactic level¶

The majority of classes in the JavaScript library is concerned with representing a JavaScript program as a collection of abstract syntax trees (ASTs).

The class ASTNode contains all entities representing nodes in the abstract syntax trees and defines generic tree traversal predicates:

ASTNode.getChild(i): returns the ith child of this AST node.
ASTNode.getAChild(): returns any child of this AST node.
ASTNode.getParent(): returns the parent node of this AST node, if any.

Note

These predicates should only be used to perform generic AST traversal. To access children of specific AST node types, the specialized predicates introduced below should be used instead. In particular, queries should not rely on the numeric indices of child nodes relative to their parent nodes: these are considered an implementation detail that may change between versions of the library.

Top-levels¶

From a syntactic point of view, each JavaScript program is composed of one or more top-level code blocks (or top-levels for short), which are blocks of JavaScript code that do not belong to a larger code block. Top-levels are represented by the class TopLevel and its subclasses:

TopLevel
- Script: a stand-alone file or HTML <script> element
  - ExternalScript: a stand-alone JavaScript file
  - InlineScript: code embedded inline in an HTML <script> tag
- CodeInAttribute: a code block originating from an HTML attribute value
  - EventHandlerCode: code from an event handler attribute such as onload
  - JavaScriptURL: code from a URL with the javascript: scheme
- Externs: a JavaScript file containing externs definitions

Every TopLevel class is contained in a File class, but a single File may contain more than one TopLevel. To go from a TopLevel tl to its File, use tl.getFile(); conversely, for a File f, predicate f.getATopLevel() returns a top-level contained in f. For every AST node, predicate ASTNode.getTopLevel() can be used to find the top-level it belongs to.

The TopLevel class additionally provides the following member predicates:

TopLevel.getNumberOfLines() returns the total number of lines (including code, comments and whitespace) in the top-level.
TopLevel.getNumberOfLinesOfCode() returns the number of lines of code, that is, lines that contain at least one token.
TopLevel.getNumberOfLinesOfComments() returns the number of lines containing or belonging to a comment.
TopLevel.isMinified() determines whether the top-level contains minified code, using a heuristic based on the average number of statements per line.

Note

By default, GitHub code scanning filters out alerts in minified top-levels, since they are often hard to interpret. When you write your own queries in Visual Studio Code, this filtering is not done automatically, so you may want to explicitly add a condition of the form and not e.getTopLevel().isMinified() or similar to your query to exclude results in minified code.

Statements and expressions¶

The most important subclasses of ASTNode besides TopLevel are Stmt and Expr, which, together with their subclasses, represent statements and expressions, respectively. This section briefly discusses some of the more important classes and predicates. For a full reference of all the subclasses of Stmt and Expr and their API, see Stmt.qll and Expr.qll.

Stmt: use Stmt.getContainer() to access the innermost function or top-level in which the statement is contained.
- ControlStmt: a statement that controls the execution of other statements, that is, a conditional, loop, try or with statement; use ControlStmt.getAControlledStmt() to access the statements that it controls.
  - IfStmt: an if statement; use IfStmt.getCondition(), IfStmt.getThen() and IfStmt.getElse() to access its condition expression, “then” branch and “else” branch, respectively.
  - LoopStmt: a loop; use Loop.getBody() and Loop.getTest() to access its body and its test expression, respectively.
    - WhileStmt, DoWhileStmt: a “while” or “do-while” loop, respectively.
    - ForStmt: a “for” statement; use ForStmt.getInit() and ForStmt.getUpdate() to access the init and update expressions, respectively.
    - EnhancedForLoop: a “for-in” or “for-of” loop; use EnhancedForLoop.getIterator() to access the loop iterator (which may be a expression or variable declaration), and EnhancedForLoop.getIterationDomain() to access the expression being iterated over.
      - ForInStmt, ForOfStmt: a “for-in” or “for-of” loop, respectively.
  - WithStmt: a “with” statement; use WithStmt.getExpr() and WithStmt.getBody() to access the controlling expression and the body of the with statement, respectively.
  - SwitchStmt: a switch statement; use SwitchStmt.getExpr() to access the expression on which the statement switches; use SwitchStmt.getCase(int) and SwitchStmt.getACase() to access individual switch cases; each case is modeled by an entity of class Case, whose member predicates Case.getExpr() and Case.getBodyStmt(int) provide access to the expression checked by the switch case (which is undefined for default), and its body.
  - TryStmt: a “try” statement; use TryStmt.getBody(), TryStmt.getCatchClause() and TryStmt.getFinally to access its body, “catch” clause and “finally” block, respectively.
- BlockStmt: a block of statements; use BlockStmt.getStmt(int) to access the individual statements in the block.
- ExprStmt: an expression statement; use ExprStmt.getExpr() to access the expression itself.
- JumpStmt: a statement that disrupts structured control flow, that is, one of break, continue, return and throw; use predicate JumpStmt.getTarget() to determine the target of the jump, which is either a statement or (for return and uncaught throw statements) the enclosing function.
  - BreakStmt: a “break” statement; use BreakStmt.getLabel() to access its (optional) target label.
  - ContinueStmt: a “continue” statement; use ContinueStmt.getLabel() to access its (optional) target label.
  - ReturnStmt: a “return” statement; use ReturnStmt.getExpr() to access its (optional) result expression.
  - ThrowStmt: a “throw” statement; use ThrowStmt.getExpr() to access its thrown expression.
- FunctionDeclStmt: a function declaration statement; see below for available member predicates.
- ClassDeclStmt: a class declaration statement; see below for available member predicates.
- DeclStmt: a declaration statement containing one or more declarators which can be accessed by predicate DeclStmt.getDeclarator(int).
  - VarDeclStmt, ConstDeclStmt, LetStmt: a var, const or let declaration statement.
Expr: use Expr.getEnclosingStmt() to obtain the innermost statement to which this expression belongs; Expr.isPure() determines whether the expression is side-effect-free.
- Identifier: an identifier; use Identifier.getName() to obtain its name.
- Literal: a literal value; use Literal.getValue() to obtain a string representation of its value, and Literal.getRawValue() to obtain its raw source text (including surrounding quotes for string literals).
  - NullLiteral, BooleanLiteral, NumberLiteral, StringLiteral, RegExpLiteral: different kinds of literals.
- ThisExpr: a “this” expression.
- SuperExpr: a “super” expression.
- ArrayExpr: an array expression; use ArrayExpr.getElement(i) to obtain the ith element expression, and ArrayExpr.elementIsOmitted(i) to check whether the ith element is omitted.
- ObjectExpr: an object expression; use ObjectExpr.getProperty(i) to obtain the ith property in the object expression; properties are modeled by class Property, which is described in more detail below.
- FunctionExpr: a function expression; see below for available member predicates.
- ArrowFunctionExpr: an ECMAScript 2015-style arrow function expression; see below for available member predicates.
- ClassExpr: a class expression; see below for available member predicates.
- ParExpr: a parenthesized expression; use ParExpr.getExpression() to obtain the operand expression; for any expression, Expr.stripParens() can be used to recursively strip off any parentheses
- SeqExpr: a sequence of two or more expressions connected by the comma operator; use SeqExpr.getOperand(i) to obtain the ith sub-expression.
- ConditionalExpr: a ternary conditional expression; member predicates ConditionalExpr.getCondition(), ConditionalExpr.getConsequent() and ConditionalExpr.getAlternate() provide access to the condition expression, the “then” expression and the “else” expression, respectively.
- InvokeExpr: a function call or a “new” expression; use InvokeExpr.getCallee() to obtain the expression specifying the function to be called, and InvokeExpr.getArgument(i) to obtain the ith argument expression.
  - CallExpr: a function call.
  - NewExpr: a “new” expression.
  - MethodCallExpr: a function call whose callee expression is a property access; use MethodCallExpr.getReceiver to access the receiver expression of the method call, and MethodCallExpr.getMethodName() to get the method name (if it can be determined statically).
- PropAccess: a property access, that is, either a “dot” expression of the form e.f or an index expression of the form e[p]; use PropAccess.getBase() to obtain the base expression on which the property is accessed (e in the example), and PropAccess.getPropertyName() to determine the name of the accessed property; if the name cannot be statically determined, getPropertyName() does not return any value.
  - DotExpr: a “dot” expression.
  - IndexExpr: an index expression (also known as computed property access).
- UnaryExpr: a unary expression; use UnaryExpr.getOperand() to obtain the operand expression.
  - NegExpr (“-“), PlusExpr (“+”), LogNotExpr (“!”), BitNotExpr (“~”), TypeofExpr, VoidExpr, DeleteExpr, SpreadElement (”…”): various types of unary expressions.
- BinaryExpr: a binary expression; use BinaryExpr.getLeftOperand() and BinaryExpr.getRightOperand() to access the operand expressions.
  - Comparison: any comparison expression.
    - EqualityTest: any equality or inequality test.
      - EqExpr (“==”), NEqExpr (“!=”): non-strict equality and inequality tests.
      - StrictEqExpr (“===”), StrictNEqExpr (“!==”): strict equality and inequality tests.
    - LTExpr (“<”), LEExpr (“<=”), GTExpr (“>”), GEExpr (“>=”): numeric comparisons.
  - LShiftExpr (“<<”), RShiftExpr (“>>”), URShiftExpr (“>>>”): shift operators.
  - AddExpr (“+”), SubExpr (“-“), MulExpr (“*”), DivExpr (“/”), ModExpr (“%”), ExpExpr (“**”): arithmetic operators.
  - BitOrExpr (“|”), XOrExpr (“^”), BitAndExpr (”&”): bitwise operators.
  - InExpr: an in test.
  - InstanceofExpr: an instanceof test.
  - LogAndExpr (”&&”), LogOrExpr (“||”): short-circuiting logical operators.
- Assignment: assignment expressions, either simple or compound; use Assignment.getLhs() and Assignment.getRhs() to access the left- and right-hand side, respectively.
  - AssignExpr: a simple assignment expression.
  - CompoundAssignExpr: a compound assignment expression.
    - AssignAddExpr, AssignSubExpr, AssignMulExpr, AssignDivExpr, AssignModExpr, AssignLShiftExpr, AssignRShiftExpr, AssignURShiftExpr, AssignOrExpr, AssignXOrExpr, AssignAndExpr, AssignExpExpr: different kinds of compound assignment expressions.
- UpdateExpr: an increment or decrement expression; use UpdateExpr.getOperand() to obtain the operand expression.
  - PreIncExpr, PostIncExpr: an increment expression.
  - PreDecExpr, PostDecExpr: a decrement expression.
- YieldExpr: a “yield” expression; use YieldExpr.getOperand() to access the (optional) operand expression; use YieldExpr.isDelegating() to check whether this is a delegating yield*.
- TemplateLiteral: an ECMAScript 2015 template literal; TemplateLiteral.getElement(i) returns the ith element of the template, which may either be an interpolated expression or a constant template element.
- TaggedTemplateExpr: an ECMAScript 2015 tagged template literal; use TaggedTemplateExpr.getTag() to access the tagging expression, and TaggedTemplateExpr.getTemplate() to access the template literal being tagged.
- TemplateElement: a constant template element; as for literals, use TemplateElement.getValue() to obtain the value of the element, and TemplateElement.getRawValue() for its raw value
- AwaitExpr: an “await” expression; use AwaitExpr.getOperand() to access the operand expression.

Stmt and Expr share a common superclass ExprOrStmt which is useful for queries that should operate either on statements or on expressions, but not on any other AST nodes.

As an example of how to use expression AST nodes, here is a query that finds expressions of the form e + f >> g; such expressions should be rewritten as (e + f) >> g to clarify operator precedence:

import javascript

from ShiftExpr shift, AddExpr add
where add = shift.getAnOperand()
select add, "This expression should be bracketed to clarify precedence rules."

Functions¶

JavaScript provides several ways of defining functions: in ECMAScript 5, there are function declaration statements and function expressions, and ECMAScript 2015 adds arrow function expressions. These different syntactic forms are represented by the classes FunctionDeclStmt (a subclass of Stmt), FunctionExpr (a subclass of Expr) and ArrowFunctionExpr (also a subclass of Expr), respectively. All three are subclasses of Function, which provides common member predicates for accessing function parameters or the function body:

Function.getId() returns the Identifier naming the function, which may not be defined for function expressions.
Function.getParameter(i) and Function.getAParameter() access the ith parameter or any parameter, respectively; parameters are modeled by the class Parameter, which is a subclass of BindingPattern (see below).
Function.getBody() returns the body of the function, which is usually a Stmt, but may be an Expr for arrow function expressions and legacy expression closures.

As an example, here is a query that finds all expression closures:

import javascript

from FunctionExpr fe
where fe.getBody() instanceof Expr
select fe, "Use arrow expressions instead of expression closures."

As another example, this query finds functions that have two parameters that bind the same variable:

import javascript

from Function fun, Parameter p, Parameter q, int i, int j
where p = fun.getParameter(i) and
    q = fun.getParameter(j) and
    i < j and
    p.getAVariable() = q.getAVariable()
select fun, "This function has two parameters that bind the same variable."

Classes¶

Classes can be defined either by class declaration statements, represented by the CodeQL class ClassDeclStmt (which is a subclass of Stmt), or by class expressions, represented by the CodeQL class ClassExpr (which is a subclass of Expr). Both of these classes are also subclasses of ClassDefinition, which provides common member predicates for accessing the name of a class, its superclass, and its body:

ClassDefinition.getIdentifier() returns the Identifier naming the function, which may not be defined for class expressions.
ClassDefinition.getSuperClass() returns the Expr specifying the superclass, which may not be defined.
ClassDefinition.getMember(n) returns the definition of member n of this class.
ClassDefinition.getMethod(n) restricts ClassDefinition.getMember(n) to methods (as opposed to fields).
ClassDefinition.getField(n) restricts ClassDefinition.getMember(n) to fields (as opposed to methods).
ClassDefinition.getConstructor() gets the constructor of this class, possibly a synthetic default constructor.

Note that class fields are not a standard language feature yet, so details of their representation may change.

Method definitions are represented by the class MethodDefinition, which (like its counterpart FieldDefinition for fields) is a subclass of MemberDefinition. That class provides the following important member predicates:

MemberDefinition.isStatic(): holds if this is a static member.
MemberDefinition.isComputed(): holds if the name of this member is computed at runtime.
MemberDefinition.getName(): gets the name of this member if it can be determined statically.
MemberDefinition.getInit(): gets the initializer of this field; for methods, the initializer is a function expressions, for fields it may be an arbitrary expression, and may be undefined.

There are three classes for modeling special methods: ConstructorDefinition models constructors, while GetterMethodDefinition and SetterMethodDefinition model getter and setter methods, respectively.

Declarations and binding patterns¶

Variables are declared by declaration statements (class DeclStmt), which come in three flavors: var statements (represented by class VarDeclStmt), const statements (represented by class ConstDeclStmt), and let statements (represented by class LetStmt). Every declaration statement has one or more declarators, represented by class VariableDeclarator.

Each declarator consists of a binding pattern, returned by predicate VariableDeclarator.getBindingPattern(), and an optional initializing expression, returned by VariableDeclarator.getInit().

Often, the binding pattern is a simple identifier, as in var x = 42. In ECMAScript 2015 and later, however, it can also be a more complex destructuring pattern, as in var [x, y] = arr.

The various kinds of binding patterns are represented by class BindingPattern and its subclasses:

VarRef: a simple identifier in an l-value position, for example the x in var x or in x = 42
Parameter: a function or catch clause parameter
ArrayPattern: an array pattern, for example, the left-hand side of [x, y] = arr
ObjectPattern: an object pattern, for example, the left-hand side of {x, y: z} = o

Here is an example of a query to find declaration statements that declare the same variable more than once, excluding results in minified code:

import javascript

from DeclStmt ds, VariableDeclarator d1, VariableDeclarator d2, Variable v, int i, int j
where d1 = ds.getDecl(i) and
    d2 = ds.getDecl(j) and
    i < j and
    v = d1.getBindingPattern().getAVariable() and
    v = d2.getBindingPattern().getAVariable() and
    not ds.getTopLevel().isMinified()
select ds, "Variable " + v.getName() + " is declared both $@ and $@.", d1, "here", d2, "here"

This is not a common problem, so you may not find any results in your own projects.

Notice the use of not ... isMinified() here and in the next few queries. This excludes any results found in minified code. If you delete and not ds.getTopLevel().isMinified() and re-run the query, two results in minified code in the meteor/meteor project are reported.

Properties¶

Properties in object literals are represented by class Property, which is also a subclass of ASTNode, but neither of Expr nor of Stmt.

Class Property has two subclasses ValueProperty and PropertyAccessor, which represent, respectively, normal value properties and getter/setter properties. Class PropertyAccessor, in turn, has two subclasses PropertyGetter and PropertySetter representing getters and setters, respectively.

The predicates Property.getName() and Property.getInit() provide access to the defined property’s name and its initial value. For PropertyAccessor and its subclasses, getInit() is overloaded to return the getter/setter function.

As an example of a query involving properties, consider the following query that flags object expressions containing two identically named properties, excluding results in minified code:

import javascript

from ObjectExpr oe, Property p1, Property p2, int i, int j
where p1 = oe.getProperty(i) and
    p2 = oe.getProperty(j) and
    i < j and
    p1.getName() = p2.getName() and
    not oe.getTopLevel().isMinified()
select oe, "Property " + p1.getName() + " is defined both $@ and $@.", p1, "here", p2, "here"

Modules¶

The JavaScript library has support for working with ECMAScript 2015 modules, as well as legacy CommonJS modules (still commonly employed by Node.js code bases) and AMD-style modules. The classes ES2015Module, NodeModule, and AMDModule represent these three types of modules, and all three extend the common superclass Module.

The most important member predicates defined by Module are:

Module.getName(): gets the name of the module, which is just the stem (that is, the basename without extension) of the enclosing file.
Module.getAnImportedModule(): gets another module that is imported (through import or require) by this module.
Module.getAnExportedSymbol(): gets the name of a symbol that this module exports.

Moreover, there is a class Import that models both ECMAScript 2015-style import declarations and CommonJS/AMD-style require calls; its member predicate Import.getImportedModule provides access to the module the import refers to, if it can be determined statically.

Name binding¶

Name binding is modeled in the JavaScript libraries using four concepts: scopes, variables, variable declarations, and variable accesses, represented by the classes Scope, Variable, VarDecl and VarAccess, respectively.

Scopes¶

In ECMAScript 5, there are three kinds of scopes: the global scope (one per program), function scopes (one per function), and catch clause scopes (one per catch clause). These three kinds of scopes are represented by the classes GlobalScope, FunctionScope and CatchScope. ECMAScript 2015 adds block scopes for let-bound variables, which are also represented by class Scope, class expression scopes (ClassExprScope), and module scopes (ModuleScope).

Class Scope provides the following API:

Scope.getScopeElement() returns the AST node inducing this scope; undefined for GlobalScope.
Scope.getOuterScope() returns the lexically enclosing scope of this scope.
Scope.getAnInnerScope() returns a scope lexically nested inside this scope.
Scope.getVariable(name), Scope.getAVariable() return a variable declared (implicitly or explicitly) in this scope.

Variables¶

The Variable class models all variables in a JavaScript program, including global variables, local variables, and parameters (both of functions and catch clauses), whether explicitly declared or not.

It is important not to confuse variables and their declarations: local variables may have more than one declaration, while global variables and the implicitly declared local arguments variable need not have a declaration at all.

Variable declarations and accesses¶

Variables may be declared by variable declarators, by function declaration statements and expressions, by class declaration statements or expressions, or by parameters of functions and catch clauses. While these declarations differ in their syntactic form, in each case there is an identifier naming the declared variable. We consider that identifier to be the declaration proper, and assign it the class VarDecl. Identifiers that reference a variable, on the other hand, are given the class VarAccess.

The most important predicates involving variables, their declarations, and their accesses are as follows:

Variable.getName(), VarDecl.getName(), VarAccess.getName() return the name of the variable.
Variable.getScope() returns the scope to which the variable belongs.
Variable.isGlobal(), Variable.isLocal(), Variable.isParameter() determine whether the variable is a global variable, a local variable, or a parameter variable, respectively.
Variable.getAnAccess() maps a Variable to all VarAccesses that refer to it.
Variable.getADeclaration() maps a Variable to all VarDecls that declare it (of which there may be none, one, or more than one).
Variable.isCaptured() determines whether the variable is ever accessed in a scope that is lexically nested within the scope where it is declared.

As an example, consider the following query which finds distinct function declarations that declare the same variable, that is, two conflicting function declarations within the same scope (again excluding minified code):

import javascript

from FunctionDeclStmt f, FunctionDeclStmt g
where f != g and f.getVariable() = g.getVariable() and
    not f.getTopLevel().isMinified() and
    not g.getTopLevel().isMinified()
select f, g

Some projects declare conflicting functions of the same name and rely on platform-specific behavior to disambiguate the two declarations.

Control flow¶

A different program representation in terms of intraprocedural control flow graphs (CFGs) is provided by the classes in library CFG.qll.

Class ControlFlowNode represents a single node in the control flow graph, which is either an expression, a statement, or a synthetic control flow node. Note that Expr and Stmt do not inherit from ControlFlowNode at the CodeQL level, although their entity types are compatible, so you can explicitly cast from one to the other if you need to map between the AST-based and the CFG-based program representations.

There are two kinds of synthetic control flow nodes: entry nodes (class ControlFlowEntryNode), which represent the beginning of a top-level or function, and exit nodes (class ControlFlowExitNode), which represent their end. They do not correspond to any AST nodes, but simply serve as the unique entry point and exit point of a control flow graph. Entry and exit nodes can be accessed through the predicates StmtContainer.getEntry() and StmtContainer.getExit().

Most, but not all, top-levels and functions have another distinguished CFG node, the start node. This is the CFG node at which execution begins. Unlike the entry node, which is a synthetic construct, the start node corresponds to an actual program element: for top-levels, it is the first CFG node of the first statement; for functions, it is the CFG node corresponding to their first parameter or, if there are no parameters, the first CFG node of the body. Empty top-levels do not have a start node.

For most purposes, using start nodes is preferable to using entry nodes.

The structure of the control flow graph is reflected in the member predicates of ControlFlowNode:

ControlFlowNode.getASuccessor() returns a ControlFlowNode that is a successor of this ControlFlowNode in the control flow graph.
ControlFlowNode.getAPredecessor() is the inverse of getASuccessor().
ControlFlowNode.isBranch() determines whether this node has more than one successor.
ControlFlowNode.isJoin() determines whether this node has more than one predecessor.
ControlFlowNode.isStart() determines whether this node is a start node.

Many control-flow-based analyses are phrased in terms of basic blocks rather than single control flow nodes, where a basic block is a maximal sequence of control flow nodes without branches or joins. The class BasicBlock from BasicBlocks.qll represents all such basic blocks. Similar to ControlFlowNode, it provides member predicates getASuccessor() and getAPredecessor() to navigate the control flow graph at the level of basic blocks, and member predicates getANode(), getNode(int), getFirstNode() and getLastNode() to access individual control flow nodes within a basic block. The predicate Function.getEntryBB() returns the entry basic block in a function, that is, the basic block containing the function’s entry node. Similarly, Function.getStartBB() provides access to the start basic block, which contains the function’s start node. As for CFG nodes, getStartBB() should normally be preferred over getEntryBB().

As an example of an analysis using basic blocks, BasicBlock.isLiveAtEntry(v, u) determines whether variable v is live at the entry of the given basic block, and if so binds u to a use of v that refers to its value at the entry. We can use it to find global variables that are used in a function where they are not live (that is, every read of the variable is preceded by a write), suggesting that the variable was meant to be declared as a local variable instead:

import javascript

from Function f, GlobalVariable gv
where gv.getAnAccess().getEnclosingFunction() = f and
    not f.getStartBB().isLiveAtEntry(gv, _)
select f, "This function uses " + gv + " like a local variable."

Many projects have some variables which look as if they were intended to be local.

Data flow¶

Definitions and uses¶

Library DefUse.qll provides classes and predicates to determine def-use relationships between definitions and uses of variables.

Classes VarDef and VarUse contain all expressions that define and use a variable, respectively. For the former, you can use predicate VarDef.getAVariable() to find out which variables are defined by a given variable definition (recall that destructuring assignments in ECMAScript 2015 define several variables at the same time). Similarly, predicate VarUse.getVariable() returns the (single) variable being accessed by a variable use.

The def-use information itself is provided by predicate VarUse.getADef(), that connects a use of a variable to a definition of the same variable, where the definition may reach the use.

As an example, the following query finds definitions of local variables that are not used anywhere; that is, the variable is either not referenced at all after the definition, or its value is overwritten:

import javascript

from VarDef def, LocalVariable v
where v = def.getAVariable() and
    not exists (VarUse use | def = use.getADef())
select def, "Dead store of local variable."

SSA¶

A more fine-grained representation of a program’s data flow based on Static Simple Assignment Form (SSA) is provided by the library semmle.javascript.SSA.

In SSA form, each use of a local variable has exactly one (SSA) definition that reaches it. SSA definitions are represented by class SsaDefinition. They are not AST nodes, since not every SSA definition corresponds to an explicit element in the source code.

Altogether, there are five kinds of SSA definitions:

Explicit definitions (SsaExplicitDefinition): these simply wrap a VarDef, that is, a definition like x = 1 appearing explicitly in the source code.
Implicit initializations (SsaImplicitInit): these represent the implicit initialization of local variables with undefined at the beginning of their scope.
Phi nodes (SsaPhiNode): these are pseudo-definitions that merge two or more SSA definitions where necessary; see the Wikipedia page linked to above for an explanation.
Variable captures (SsaVariableCapture): these are pseudo-definitions appearing at places in the code where the value of a captured variable may change without there being an explicit assignment, for example due to a function call.
Refinement nodes (SsaRefinementNode): these are pseudo-definitions appearing at places in the code where something becomes known about a variable; for example, a conditional if (x === null) induces a refinement node at the beginning of its “then” branch recording the fact that x is known to be null there. (In the literature, these are sometimes known as “pi nodes.”)

Data flow nodes¶

Moving beyond just variable definitions and uses, library semmle.javascript.dataflow.DataFlow provides a representation of the program as a data flow graph. Its nodes are values of class DataFlow::Node, which has two subclasses ValueNode and SsaDefinitionNode. Nodes of the former kind wrap an expression or a statement that is considered to produce a value (specifically, a function or class declaration statement, or a TypeScript namespace or enum declaration). Nodes of the latter kind wrap SSA definitions.

You can use the predicate DataFlow::valueNode to convert an expression, function or class into its corresponding ValueNode, and similarly DataFlow::ssaDefinitionNode to map an SSA definition to its corresponding SsaDefinitionNode.

There is also an auxiliary predicate DataFlow::parameterNode that maps a parameter to its corresponding data flow node. (This is really just a convenience wrapper around DataFlow::ssaDefinitionNode, since parameters are also considered to be SSA definitions.)

Going in the other direction, there is a predicate ValueNode.getAstNode() for mapping from ValueNodes to ASTNodes, and SsaDefinitionNode.getSsaVariable() for mapping from SsaDefinitionNodes to SsaVariables. There is also a utility predicate Node.asExpr() that gets the underlying expression for a ValueNode, and is undefined for all nodes that do not correspond to an expression. (Note in particular that this predicate is not defined for ValueNodes wrapping function or class declaration statements!)

You can use the predicate DataFlow::Node.getAPredecessor() to find other data flow nodes from which values may flow into this node, and getASuccessor for the other direction.

For example, here is a query that finds all invocations of a method called send on a value that comes from a parameter named res, indicating that it is perhaps sending an HTTP response:

import javascript

from SimpleParameter res, DataFlow::Node resNode, MethodCallExpr send
where res.getName() = "res" and
      resNode = DataFlow::parameterNode(res) and
      resNode.getASuccessor+() = DataFlow::valueNode(send.getReceiver()) and
      send.getMethodName() = "send"
select send

Note that the data flow modeling in this library is intraprocedural, that is, flow across function calls and returns is not modeled. Likewise, flow through object properties and global variables is not modeled.

Type inference¶

The library semmle.javascript.dataflow.TypeInference implements a simple type inference for JavaScript based on intraprocedural, heap-insensitive flow analysis. Basically, the inference algorithm approximates the possible concrete runtime values of variables and expressions as sets of abstract values (represented by the class AbstractValue), each of which stands for a set of concrete values.

For example, there is an abstract value representing all non-zero numbers, and another representing all non-empty strings except for those that can be converted to a number. Both of these abstract values are fairly coarse approximations that represent very large sets of concrete values.

Other abstract values are more precise, to the point where they represent single concrete values: for example, there is an abstract value representing the concrete null value, and another representing the number zero.

There is a special group of abstract values called indefinite abstract values that represent all concrete values. The analysis uses these to handle expressions for which it cannot infer a more precise value, such as function parameters (as mentioned above, the analysis is intraprocedural and hence does not model argument passing) or property reads (the analysis does not model property values either).

Each indefinite abstract value is associated with a string value describing the cause of imprecision. In the above examples, the indefinite value for the parameter would have cause "call", while the indefinite value for the property would have cause "heap".

To check whether an abstract value is indefinite, you can use the isIndefinite member predicate. Its single argument describes the cause of imprecision.

Each abstract value has one or more associated types (CodeQL class InferredType corresponding roughly to the type tags computed by the typeof operator. The types are null, undefined, boolean, number, string, function, class, date and object.

To access the results of the type inference, use class DataFlow::AnalyzedNode: any DataFlow::Node can be cast to this class, and additionally there is a convenience predicate Expr::analyze that maps expressions directly to their corresponding AnalyzedNodes.

Once you have an AnalyzedNode, you can use predicate AnalyzedNode.getAValue() to access the abstract values inferred for it, and getAType() to get the inferred types.

For example, here is a query that looks for null checks on expressions that cannot, in fact, be null:

import javascript

from StrictEqualityTest eq, DataFlow::AnalyzedNode nd, NullLiteral null
where eq.hasOperands(nd.asExpr(), null) and
      not nd.getAValue().isIndefinite(_) and
      not nd.getAValue() instanceof AbstractNull
select eq, "Spurious null check."

To paraphrase, the query looks for equality tests eq where one operand is a null literal and the other some expression that we convert to an AnalyzedNode. If the type inference results for that node are precise (that is, none of the inferred values is indefinite) and (the abstract representation of) null is not among them, we flag eq.

You can add custom type inference rules by defining new subclasses of DataFlow::AnalyzedNode and overriding getAValue. You can also introduce new abstract values by extending the abstract class CustomAbstractValueTag, which is a subclass of string: each string belonging to that class induces a corresponding abstract value of type CustomAbstractValue. You can use the predicate CustomAbstractValue.getTag() to map from the abstract value to its tag. By implementing the abstract predicates of class CustomAbstractValueTag you can define the semantics of your custom abstract values, such as what primitive value they coerce to and what type they have.

Call graph¶

The JavaScript library implements a simple call graph construction algorithm to statically approximate the possible call targets of function calls and new expressions. Due to the dynamically typed nature of JavaScript and its support for higher-order functions and reflective language features, building static call graphs is quite difficult. Simple call graph algorithms tend to be incomplete, that is, they often fail to resolve all possible call targets. More sophisticated algorithms can suffer from the opposite problem of imprecision, that is, they may infer many spurious call targets.

The call graph is represented by the member predicate getACallee() of class DataFlow::InvokeNode, which computes possible callees of the given invocation, that is, functions that may at runtime be invoked by this expression.

Furthermore, there are three member predicates that indicate the quality of the callee information for this invocation:

DataFlow::InvokeNode.isImprecise(): holds for invocations where the call graph builder might infer spurious call targets.
DataFlow::InvokeNode.isIncomplete(): holds for invocations where the call graph builder might fail to infer possible call targets.
DataFlow::InvokeNode.isUncertain(): holds if either isImprecise() or isIncomplete() holds.

As an example of a call-graph-based query, here is a query to find invocations for which the call graph builder could not find any callees, despite the analysis being complete for this invocation:

import javascript

from DataFlow::InvokeNode invk
where not invk.isIncomplete() and
      not exists(invk.getACallee())
select invk, "Unable to find a callee for this invocation."

Inter-procedural data flow¶

The data flow graph-based analyses described so far are all intraprocedural: they do not take flow from function arguments to parameters or from a return to the function’s caller into account. The data flow library also provides a framework for constructing custom inter-procedural analyses.

We distinguish here between data flow proper, and taint tracking: the latter not only considers value-preserving flow (such as from variable definitions to uses), but also cases where one value influences (“taints”) another without determining it entirely. For example, in the assignment s2 = s1.substring(i), the value of s1 influences the value of s2, because s2 is assigned a substring of s1. In general, s2 will not be assigned s1 itself, so there is no data flow from s1 to s2, but s1 still taints s2.

It is a common pattern that we wish to specify data flow or taint analysis in terms of its sources (where flow starts), sinks (where it should be tracked), and barriers or sanitizers (where flow is interrupted). Sanitizers they are very common in security analyses: for example, an analysis that tracks the flow of untrusted user input into, say, a SQL query has to keep track of code that validates the input, thereby making it safe to use. Such a validation step is an example of a sanitizer.

The classes DataFlow::Configuration and TaintTracking::Configuration allow specifying a data flow or taint analysis, respectively, by overriding the following predicates:

isSource(DataFlow::Node nd) selects all nodes nd from where flow tracking starts.
isSink(DataFlow::Node nd) selects all nodes nd to which the flow is tracked.
isBarrier(DataFlow::Node nd) selects all nodes nd that act as a barrier for data flow; isSanitizer is the corresponding predicate for taint tracking configurations.
isBarrierEdge(DataFlow::Node src, DataFlow::Node trg) is a variant of isBarrier(nd) that allows specifying barrier edges in addition to barrier nodes; again, isSanitizerEdge is the corresponding predicate for taint tracking;
isAdditionalFlowStep(DataFlow::Node src, DataFlow::Node trg) allows specifying custom additional flow steps for this analysis; isAdditionalTaintStep is the corresponding predicate for taint tracking configurations.

Since for technical reasons both Configuration classes are subtypes of string, you have to choose a unique name for each flow configuration and equate this with it in the characteristic predicate (as in the example below).

The predicate Configuration.hasFlow performs the actual flow tracking, starting at a source and looking for flow to a sink that does not pass through a barrier node or edge.

For example, suppose that we are developing an analysis to find hard-coded passwords. We might write a simple query that looks for string constants flowing into variables named "password".

import javascript

class PasswordTracker extends DataFlow::Configuration {
    PasswordTracker() {
        // unique identifier for this configuration
        this = "PasswordTracker"
    }

    override predicate isSource(DataFlow::Node nd) {
        nd.asExpr() instanceof StringLiteral
    }

    override predicate isSink(DataFlow::Node nd) {
        passwordVarAssign(_, nd)
    }

    predicate passwordVarAssign(Variable v, DataFlow::Node nd) {
       v.getAnAssignedExpr() = nd.asExpr() and
       v.getName().toLowerCase() = "password"
    }
}

Now we can rephrase our query to use Configuration.hasFlow:

from PasswordTracker pt, DataFlow::Node source, DataFlow::Node sink, Variable v
where pt.hasFlow(source, sink) and pt.passwordVarAssign(v, sink)
select sink, "Password variable " + v + " is assigned a constant string."

Syntax errors¶

JavaScript code that contains syntax errors cannot usually be analyzed. For such code, the lexical and syntactic representations are not available, and hence no name binding information, call graph or control and data flow. All that is available in this case is a value of class JSParseError representing the syntax error. It provides information about the syntax error location (JSParseError is a subclass of Locatable) and the error message through predicate JSParseError.getMessage.

Note that for some very simple syntax errors the parser can recover and continue parsing. If this happens, lexical and syntactic information is available in addition to the JSParseError values representing the (recoverable) syntax errors encountered during parsing.

Frameworks¶

AngularJS¶

The semmle.javascript.frameworks.AngularJS library provides support for working with AngularJS (Angular 1.x) code. Its most important classes are:

AngularJS::AngularModule: an Angular module
AngularJS::DirectiveDefinition, AngularJS::FactoryRecipeDefinition, AngularJS::FilterDefinition, AngularJS::ControllerDefinition: a definition of a directive, service, filter or controller, respectively
AngularJS::InjectableFunction: a function that is subject to dependency injection

HTTP framework libraries¶

The library semmle.javacript.frameworks.HTTP provides classes modeling common concepts from various HTTP frameworks.

Currently supported frameworks are Express, the standard Node.js http and https modules, Connect, Koa, Hapi and Restify.

The most important classes include (all in module HTTP):

ServerDefinition: an expression that creates a new HTTP server.
RouteHandler: a callback for handling an HTTP request.
RequestExpr: an expression that may contain an HTTP request object.
ResponseExpr: an expression that may contain an HTTP response object.
HeaderDefinition: an expression that sets one or more HTTP response headers.
CookieDefinition: an expression that sets a cookie in an HTTP response.
RequestInputAccess: an expression that accesses user-controlled request data.

For each framework library, there is a corresponding CodeQL library (for example semmle.javacript.frameworks.Express) that instantiates the above classes for that framework and adds framework-specific classes.

Node.js¶

The semmle.javascript.NodeJS library provides support for working with Node.js modules through the following classes:

NodeModule: a top-level that defines a Node.js module; see the section on Modules for more information.
Require: a call to the special require function that imports a module.

As an example of the use of these classes, here is a query that counts for every module how many other modules it imports:

import javascript

from NodeModule m
select m, count(m.getAnImportedModule())

When you analyze a project, for each module you can see how many other modules it imports.

NPM¶

The semmle.javascript.NPM library provides support for working with NPM packages through the following classes:

PackageJSON: a package.json file describing an NPM package; various getter predicates are available for accessing detailed information about the package, which are described in the online API documentation.
BugTrackerInfo, ContributorInfo, RepositoryInfo: these classes model parts of the package.json file providing information on bug tracking systems, contributors and repositories.
PackageDependencies: models the dependencies of an NPM package; the predicate PackageDependencies.getADependency(pkg, v) binds pkg to the name and v to the version of a package required by a package.json file.
NPMPackage: a subclass of Folder that models an NPM package; important member predicates include:
- NPMPackage.getPackageName() returns the name of this package.
- NPMPackage.getPackageJSON() returns the package.json file for this package.
- NPMPackage.getNodeModulesFolder() returns the node_modules folder for this package.
- NPMPackage.getAModule() returns a Node.js module belonging to this package (not including modules in the node_modules folder).

As an example of the use of these classes, here is a query that identifies unused dependencies, that is, module dependencies that are listed in the package.json file, but which are not imported by any require call:

import javascript

from NPMPackage pkg, PackageDependencies deps, string name
where deps = pkg.getPackageJSON().getDependencies() and
deps.getADependency(name, _) and
not exists (Require req | req.getTopLevel() = pkg.getAModule() | name = req.getImportedPath().getValue())
select deps, "Unused dependency '" + name + "'."

React¶

The semmle.javascript.frameworks.React library provides support for working with React code through the ReactComponent class, which models a React component defined either in the functional style or the class-based style (both ECMAScript 2015 classes and old-style React.createClass classes are supported).

Databases¶

The class SQL::SqlString represents an expression that is interpreted as a SQL command. Currently, we model SQL commands issued through the following npm packages: mysql, pg, pg-pool, sqlite3, mssql and sequelize.

Similarly, the class NoSQL::Query represents an expression that is interpreted as a NoSQL query by the mongodb or mongoose package.

Finally, the class DatabaseAccess contains all data flow nodes that perform a database access using any of the packages above.

For example, here is a query to find SQL queries that use string concatenation (instead of a templating-based solution, which is usually safer):

import javascript

from SQL::SqlString ss
where ss instanceof AddExpr
select ss, "Use templating instead of string concatenation."

Miscellaneous¶

Externs¶

The semmle.javascript.Externs library provides support for working with externs through the following classes:

ExternalDecl: common superclass modeling all different kinds of externs declarations; it defines two member predicates:
- ExternalDecl.getQualifiedName() returns the fully qualified name of the declared entity.
- ExternalDecl.getName() returns the unqualified name of the declared entity.
ExternalTypedef: a subclass of ExternalDecl representing type declarations; unlike other externs declarations, such declarations do not declare a function or object that is present at runtime, but simply introduce an alias for a type.
ExternalVarDecl: a subclass of ExternalDecl representing a variable or function declaration; it defines two member predicates:
- ExternalVarDecl.getInit() returns the initializer associated with this declaration, if any; this can either be a Function or an Expr.
- ExternalVarDecl.getDocumentation() returns the JSDoc comment associated with this declaration.

Variables and functions declared in an externs file are either globals (represented by class ExternalGlobalDecl), or members (represented by class ExternalMemberDecl).

Members are further subdivided into static members (class ExternalStaticMemberDecl) and instance members (class ExternalInstanceMemberDecl).

For more details on these and other classes representing externs, see the API documentation.

HTML¶

The semmle.javascript.HTML library provides support for working with HTML documents. They are represented as a tree of HTML::Element nodes, each of which may have zero or more attributes represented by class HTML::Attribute.

Similar to the abstract syntax tree representation, HTML::Element has member predicates getChild(i) and getParent() to navigate from an element to its ith child element and its parent element, respectively. Use predicate HTML::Element.getAttribute(i) to get the ith attribute of the element, and HTML::Element.getAttributeByName(n) to get the attribute with name n.

For HTML::Attribute, predicates getName() and getValue() provide access to the attribute’s name and value, respectively.

Both HTML::Element and HTML::Attribute have a predicate getRoot() that gets the root HTML::Element of the document to which they belong.

JSDoc¶

The semmle.javascript.JSDoc library provides support for working with JSDoc comments. Documentation comments are parsed into an abstract syntax tree representation closely following the format employed by the Doctrine JSDoc parser.

A JSDoc comment as a whole is represented by an entity of class JSDoc, while individual tags are represented by class JSDocTag. Important member predicates of these two classes include:

JSDoc.getDescription() returns the descriptive header of the JSDoc comment, if any.
JSDoc.getComment() maps the JSDoc entity to its underlying Comment entity.
JSDocTag.getATag() returns a tag in this JSDoc comment.
JSDocTag.getTitle() returns the title of his tag; for instance, an @param tag has title "param".
JSDocTag.getName() returns the name of the parameter or variable documented by this tag.
JSDocTag.getType() returns the type of the parameter or variable documented by this tag.
JSDocTag.getDescription() returns the description associated with this tag.

Types in JSDoc comments are represented by the class JSDocTypeExpr and its subclasses, which again represent type expressions as abstract syntax trees. Examples of type expressions are JSDocAnyTypeExpr, representing the “any” type *, or JSDocNullTypeExpr, representing the null type.

As an example, here is a query that finds @param tags that do not specify the name of the documented parameter:

import javascript

from JSDocTag t
where t.getTitle() = "param" and
not exists(t.getName())
select t, "@param tag is missing name."

For full details on these and other classes representing JSDoc comments and type expressions, see the API documentation.

JSX¶

The semmle.javascript.JSX library provides support for working with JSX code.

Similar to the representation of HTML documents, JSX fragments are modeled as a tree of JSXElements, each of which may have zero or more JSXAttributes.

However, unlike HTML, JSX is interleaved with JavaScript, hence JSXElement is a subclass of Expr. Like HTML::Element, it has predicates getAttribute(i) and getAttributeByName(n) to look up attributes of a JSX element. Its body elements can be accessed by predicate getABodyElement(); note that the results of this predicate are arbitrary expressions, which may either be further JSXElements, or other expressions that are interpolated into the body of the outer element.

JSXAttribute, again not unlike HTML::Attribute, has predicates getName() and getValue() to access the attribute name and value.

JSON¶

The semmle.javascript.JSON library provides support for working with JSON files that were processed by the JavaScript extractor when building the CodeQL database.

JSON files are modeled as trees of JSON values. Each JSON value is represented by an entity of class JSONValue, which provides the following member predicates:

JSONValue.getParent() returns the JSON object or array in which this value occurs.
JSONValue.getChild(i) returns the ith child of this JSON object or array.

Note that JSONValue is a subclass of Locatable, so the usual member predicates of Locatable can be used to determine the file in which a JSON value appears, and its location within that file.

Class JSONValue has the following subclasses:

JSONPrimitiveValue: a JSON-encoded primitive value; use JSONPrimitiveValue.getValue() to obtain a string representation of the value.
- JSONNull, JSONBoolean, JSONNumber, JSONString: subclasses of JSONPrimitiveValue representing the various kinds of primitive values.
JSONArray: a JSON-encoded array; use JSONArray.getElementValue(i) to access the ith element of the array.
JSONObject: a JSON-encoded object; use JSONObject.getValue(n) to access the value of property n of the object.

Regular expressions¶

The semmle.javascript.Regexp library provides support for working with regular expression literals. The syntactic structure of regular expression literals is represented as an abstract syntax tree of regular expression terms, modeled by the class RegExpTerm. Similar to ASTNode, class RegExpTerm provides member predicates getParent() and getChild(i) to navigate the structure of the syntax tree.

Various subclasses of RegExpTerm model different kinds of regular expression constructs and operators; see the API documentation for details.

YAML¶

The semmle.javascript.YAML library provides support for working with YAML files that were processed by the JavaScript extractor when building the CodeQL database.

YAML files are modeled as trees of YAML nodes. Each YAML node is represented by an entity of class YAMLNode, which provides, among others, the following member predicates:

YAMLNode.getParentNode() returns the YAML collection in which this node is syntactically nested.
YAMLNode.getChildNode(i) returns the ith child node of this node, YAMLNode.getAChildNode() returns any child node of this node.
YAMLNode.getTag() returns the tag of this YAML node.
YAMLNode.getAnchor() returns the anchor associated with this YAML node, if any.
YAMLNode.eval() returns the YAMLValue this YAML node evaluates to after resolving aliases and includes.

The various kinds of scalar values available in YAML are represented by classes YAMLInteger, YAMLFloat, YAMLTimestamp, YAMLBool, YAMLNull and YAMLString. Their common superclass is YAMLScalar, which has a member predicate getValue() to obtain the value of a scalar as a string.

YAMLMapping and YAMLSequence represent mappings and sequences, respectively, and are subclasses of YAMLCollection.

Alias nodes are represented by class YAMLAliasNode, while YAMLMergeKey and YAMLInclude represent merge keys and !include directives, respectively.

Predicate YAMLMapping.maps(key, value) models the key-value relation represented by a mapping, taking merge keys into account.