CodeQL documentation

Creating CodeQL databases

Before you analyze your code using CodeQL, you need to create a CodeQL database containing all the data required to run queries on your code.

CodeQL analysis relies on extracting relational data from your code, and using it to build a CodeQL database. CodeQL databases contain all of the important information about a codebase, which can be analyzed by executing CodeQL queries against it. Before you generate a CodeQL database, you need to:

  • Install and set up the CodeQL CLI. For more information, see “Getting started with the CodeQL CLI.”
  • Check out the version of your codebase you want to analyze. The directory should be ready to build, with all dependencies already installed.

For information about using the CodeQL CLI in a third-party CI system to create results to display in GitHub as code scanning alerts, see Configuring CodeQL CLI in your CI system in the GitHub documentation. For information about enabling CodeQL code scanning using GitHub Actions, see Setting up code scanning for a repository in the GitHub documentation.

Running codeql database create

CodeQL databases are created by running the following command from the checkout root of your project:

codeql database create <database> --language=<language-identifier>

You must specify:

  • <database>: a path to the new database to be created. This directory will be created when you execute the command—you cannot specify an existing directory.

  • --language: the identifier for the language to create a database for. When used with --db-cluster, the option accepts a comma-separated list, or can be specified more than once. CodeQL supports creating databases for the following languages:

    Language Identifier
    C/C++ cpp
    C# csharp
    Go go
    Java java
    JavaScript/TypeScript javascript
    Python python

You can specify additional options depending on the location of your source file, if the code needs to be compiled, and if you want to create CodeQL databases for more than one language:

  • --source-root: the root folder for the primary source files used in database creation. By default, the command assumes that the current directory is the source root—use this option to specify a different location.
  • --db-cluster: use for multi-language codebases when you want to create databases for more than one language.
  • --command: used when you create a database for one or more compiled languages, omit if the only languages requested are Python and JavaScript. This specifies the build commands needed to invoke the compiler. Commands are run from the current folder, or --source-root if specified. If you don’t include a --command, CodeQL will attempt to detect the build system automatically, using a built-in autobuilder.
  • --no-run-unnecessary-builds: used with --db-cluster to suppress the build command for languages where the CodeQL CLI does not need to monitor the build (for example, Python and JavaScript/TypeScript).

For full details of all the options you can use when creating databases, see the database create reference documentation.

Progress and results

Errors are reported if there are any problems with the options you have specified. For interpreted languages, the extraction progress is displayed in the console—for each source file, it reports if extraction was successful or if it failed. For compiled languages, the console will display the output of the build system.

When the database is successfully created, you’ll find a new directory at the path specified in the command. If you used the --db-cluster option to create more than one database, a subdirectory is created for each language. Each CodeQL database directory contains a number of subdirectories, including the relational data (required for analysis) and a source archive—a copy of the source files made at the time the database was created—which is used for displaying analysis results.

Creating databases for non-compiled languages

The CodeQL CLI includes extractors to create databases for non-compiled languages—specifically, JavaScript (and TypeScript) and Python. These extractors are automatically invoked when you specify JavaScript or Python as the --language option when executing database create. When creating databases for these languages you must ensure that all additional dependencies are available.

Important

When you run database create for JavaScript, TypeScript, and Python, you should not specify a --command option. Otherwise this overrides the normal extractor invocation, which will create an empty database. If you create databases for multiple languages and one of them is a compiled language, use the --no-run-unnecessary-builds option to skip the command for the languages that don’t need to be compiled.

JavaScript and TypeScript

Creating databases for JavaScript requires no additional dependencies, but if the project includes TypeScript files, you must install Node.js 6.x or later. In the command line you can specify --language=javascript to extract both JavaScript and TypeScript files:

codeql database create --language=javascript --source-root <folder-to-extract> <output-folder>/javascript-database

Here, we have specified a --source-root path, which is the location where database creation is executed, but is not necessarily the checkout root of the codebase.

Python

When creating databases for Python you must ensure:

  • You have the all of the required versions of Python installed.
  • You have access to the pip packaging management system and can install any packages that the codebase depends on.
  • You have installed the virtualenv pip module.

In the command line you must specify --language=python. For example

codeql database create --language=python <output-folder>/python-database

executes the database create subcommand from the code’s checkout root, generating a new Python database at <output-folder>/python-database.

Creating databases for compiled languages

For compiled languages, CodeQL needs to invoke the required build system to generate a database, therefore the build method must be available to the CLI.

Detecting the build system

The CodeQL CLI includes autobuilders for C/C++, C#, Go, and Java code. CodeQL autobuilders allow you to build projects for compiled languages without specifying any build commands. When an autobuilder is invoked, CodeQL examines the source for evidence of a build system and attempts to run the optimal set of commands required to extract a database.

An autobuilder is invoked automatically when you execute codeql database create for a compiled --language if don’t include a --command option. For example, for a Java codebase, you would simply run:

codeql database create --language=java <output-folder>/java-database

If a codebase uses a standard build system, relying on an autobuilder is often the simplest way to create a database. For sources that require non-standard build steps, you may need to explicitly define each step in the command line.

Creating databases for Go

For Go, install the Go toolchain (version 1.11 or later) and, if there are dependencies, the appropriate dependency manager (such as dep).

The Go autobuilder attempts to automatically detect code written in Go in a repository, and only runs build scripts in an attempt to fetch dependencies. To force CodeQL to limit extraction to the files compiled by your build script, set the environment variable CODEQL_EXTRACTOR_GO_BUILD_TRACING=on or use the --command option to specify a build command.

Specifying build commands

The following examples are designed to give you an idea of some of the build commands that you can specify for compiled languages.

Important

The --command option accepts a single argument—if you need to use more than one command, specify --command multiple times.

If you need to pass subcommands and options, the whole argument needs to be quoted to be interpreted correctly.

  • C/C++ project built using make:

    codeql database create cpp-database --language=cpp --command=make
    
  • C# project built using dotnet build (.NET Core 3.0 or later):

    codeql database create csharp-database --language=csharp --command='dotnet build /t:rebuild'
    

    On Linux and macOS (but not Windows), you need to disable shared compilation when building C# projects with .NET Core 2 or earlier, so expand the command to:

    codeql database create csharp-database --language=csharp --command='dotnet build /p:UseSharedCompilation=false /t:rebuild'
    
  • Go project built using the COEQL_EXTRACTOR_GO_BUILD_TRACING=on environment variable:

    CODEQL_EXTRACTOR_GO_BUILD_TRACING=on codeql database create go-database --language=go
    
  • Go project built using a custom build script:

    codeql database create go-database --language=go --command='./scripts/build.sh'
    
  • Java project built using Gradle:

    codeql database create java-database --language=java --command='gradle clean test'
    
  • Java project built using Maven:

    codeql database create java-database --language=java --command='mvn clean install'
    
  • Java project built using Ant:

    codeql database create java-database --language=java --command='ant -f build.xml'
    
  • Project built using a custom build script:

    codeql database create new-database --language=<language> --command='./scripts/build.sh'
    

    This command runs a custom script that contains all of the commands required to build the project.

Obtaining databases from LGTM.com

LGTM.com analyzes thousands of open-source projects using CodeQL. For each project on LGTM.com, you can download an archived CodeQL database corresponding to the most recently analyzed revision of the code. These databases can also be analyzed using the CodeQL CLI or used with the CodeQL extension for Visual Studio Code.

To download a database from LGTM.com:

  1. Log in to LGTM.com.
  2. Find a project you’re interested in and display the Integrations tab (for example, Apache Kafka).
  3. Scroll to the CodeQL databases for local analysis section at the bottom of the page.
  4. Download databases for the languages that you want to explore.

Before running an analysis, unzip the databases and try upgrading the unzipped databases to ensure they are compatible with your local copy of the CodeQL queries and libraries.

Note

The CodeQL CLI currently extracts data from additional, external files in a different way to the legacy QL tools. For example, when you run codeql database create the CodeQL CLI extracts data from some relevant XML files for Java and C#, but not for the other supported languages, such as JavaScript. This means that CodeQL databases created using the CodeQL CLI may be slightly different from those obtained from LGTM.com or created using the legacy QL command-line tools. As such, analysis results generated from databases created using the CodeQL CLI may also differ from those generated from databases obtained from elsewhere.