CodeQL documentation

Using API graphs in Ruby

API graphs are a uniform interface for referring to functions, classes, and methods defined in external libraries.

About this article

This article describes how you can use API graphs to reference classes and functions defined in library code. API graphs are particularly useful when you want to model the remote flow sources available from external library functions.

Module and class references

The most common entry point into the API graph is when a top-level module or class is accessed. For example, you can access the API graph node corresponding to the ::Regexp class by using the API::getTopLevelMember method defined in the codeql.ruby.ApiGraphs module, as the following snippet demonstrates.

import codeql.ruby.ApiGraphs

select API::getTopLevelMember("Regexp")

The example above finds references to a top-level class. For nested modules and classes, you can use the getMember method. For example the following query selects references to the Net::HTTP class.

import codeql.ruby.ApiGraphs

select API::getTopLevelMember("Net").getMember("HTTP")

Note that you should specify module names without :: symbols. If you write API::getTopLevelMember("Net::HTTP"), it will not do what you expect. Instead, you need to decompose this name into an access of the HTTP member of the API graph node for Net, as shown in the example above.

Calls and class instantiations

To track the calls of externally defined functions, you can use the getMethod method. The following snippet finds all calls of Regexp.compile:

import codeql.ruby.ApiGraphs

select API::getTopLevelMember("Regexp").getMethod("compile")

The example above is for a call to a class method. Tracking calls to instance methods, is a two-step process, first you need to find instances of the class before you can find the calls to methods on those instances. The following snippet finds instantiations of the Regexp class:

import codeql.ruby.ApiGraphs

select API::getTopLevelMember("Regexp").getInstance()

Note that the getInstance method also includes subclasses. For example if there is a class SpecialRegexp < Regexp then getInstance also finds SpecialRegexp.new.

The following snippet builds on the above to find calls of the Regexp#match? instance method:

import codeql.ruby.ApiGraphs

select API::getTopLevelMember("Regexp").getInstance().getMethod("match?")

Subclasses

Many libraries are used by extending one or more library classes. To track this in the API graph, you can use the getASubclass method to get the API graph node corresponding to the immediate subclasses of a node. To find all subclasses, use * or + to apply the method repeatedly. You can see an example where all subclasses are identified using getASubclass* below.

Note that getASubclass can only return subclasses that are extracted as part of the CodeQL database that you are analyzing. When libraries have predefined subclasses, you will need to explicitly include them in your model. For example, the ActionController::Base class has a predefined subclass Rails::ApplicationController. To find all subclasses of ActionController::Base, you must explicitly include the subclasses of Rails::ApplicationController as well.

import codeql.ruby.ApiGraphs


API::Node actionController() {
  result =
    [
      API::getTopLevelMember("ActionController").getMember("Base"),
      API::getTopLevelMember("Rails").getMember("ApplicationController")
    ].getASubclass*()
}

select actionController()

Using the API graph in dataflow queries

Dataflow queries often search for points where data from external sources enters the code base as well as places where data leaves the code base. API graphs provide a convenient way to refer to external API components such as library functions and their inputs and outputs. However, you do not use API graph nodes directly in dataflow queries.

  • API graph nodes model entities that are defined outside your code base.
  • Dataflow nodes model entities defined within the current code base.

You bridge the gap between the entities outside and inside your code base using the API node class methods: asSource() and asSink().

The asSource() method is used to select dataflow nodes where a value from an external source enters the current code base. A typical example is the return value of a library function such as File.read(path):

import codeql.ruby.ApiGraphs

select API::getTopLevelMember("File").getMethod("read").getReturn().asSource()

The asSink() method is used to select dataflow nodes where a value leaves the current code base and flows into an external library. For example the second parameter of the File.write(path, value) method.

import codeql.ruby.ApiGraphs

select API::getTopLevelMember("File").getMethod("write").getParameter(1).asSink()

A more complex example is a call to File.open with a block argument. This function creates a File instance and passes it to the supplied block. In this case, we are interested in the first parameter of the block because this is where an externally created value enters the code base, that is, the |file| in the Ruby example below:

File.open("/my/file.txt", "w") { |file| file << "Hello world" }

The following snippet of CodeQL finds parameters of blocks of File.open method calls:

import codeql.ruby.ApiGraphs

select API::getTopLevelMember("File").getMethod("open").getBlock().getParameter(0).asSource()

The following example is a dataflow query that that uses API graphs to find cases where data that is read flows into a call to File.write.

import codeql.ruby.DataFlow
import codeql.ruby.ApiGraphs

module Configuration implements DataFlow::ConfigSig {
  predicate isSource(DataFlow::Node source) {
    source = API::getTopLevelMember("File").getMethod("read").getReturn().asSource()
  }

  predicate isSink(DataFlow::Node sink) {
    sink = API::getTopLevelMember("File").getMethod("write").getParameter(1).asSink()
  }
}

module Flow = DataFlow::Global<Configuration>;

from DataFlow::Node src, DataFlow::Node sink
where Flow::flow(src, sink)
select src, "The data read here flows into a $@ call.", sink, "File.write"
  • © GitHub, Inc.
  • Terms
  • Privacy