A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://codeql.github.com/docs/codeql-language-guides/customizing-library-models-for-python/ below:

Customizing Library Models for Python — CodeQL

Customizing Library Models for Python

Beta Notice - Unstable API

Library customization using data extensions is currently in beta and subject to change.

Breaking changes to this format may occur while in beta.

Python analysis can be customized by adding library models in data extension files.

A data extension for Python is a YAML file of the form:

extensions:
  - addsTo:
      pack: codeql/python-all
      extensible: <name of extensible predicate>
    data:
      - <tuple1>
      - <tuple2>
      - ...

The CodeQL library for Python exposes the following extensible predicates:

We’ll explain how to use these using a few examples, and provide some reference material at the end of this article.

Example: Taint sink in the ‘fabric’ package

In this example, we’ll show how to add the following argument, passed to sudo from the fabric package, as a command-line injection sink:

from fabric.operations import sudo
sudo(cmd) # <-- add 'cmd' as a taint sink

Note that this sink is already recognized by the CodeQL Python analysis, but for this example, you could use the following data extension:

extensions:
  - addsTo:
      pack: codeql/python-all
      extensible: sinkModel
    data:
      - ["fabric", "Member[operations].Member[sudo].Argument[0]", "command-injection"]
Example: Taint sink in the ‘invoke’ package

Often sinks are found as arguments to methods rather than functions. In this example, we’ll show how to add the following argument, passed to run from the invoke package, as a command-line injection sink:

import invoke
c = invoke.Context()
c.run(cmd) # <-- add 'cmd' as a taint sink

Note that this sink is already recognized by the CodeQL Python analysis, but for this example, you could use the following data extension:

extensions:
  - addsTo:
      pack: codeql/python-all
      extensible: sinkModel
    data:
      - ["invoke", "Member[Context].Instance.Member[run].Argument[0]", "command-injection"]

Note that the Instance component is used to select instances of a class, including instances of its subclasses. Since methods on instances are common targets, we have a more compact syntax for selecting them. The first column, the type, is allowed to contain a dotted path ending in a class name. This will begin the search at instances of that class. Using this syntax, the previous example could be written as:

extensions:
  - addsTo:
      pack: codeql/python-all
      extensible: sinkModel
    data:
      - ["invoke.Context", "Member[run].Argument[0]", "command-injection"]
Continued example: Multiple ways to obtain a type

The invoke package provides multiple ways to obtain a Context instance. The following example shows how to add a new way to obtain a Context instance:

from invoke import context
c = context.Context()
c.run(cmd) # <-- add 'cmd' as a taint sink

Comparing to the previous Python snippet, the Context class is now found as invoke.context.Context instead of invoke.Context. We could add a data extension similar to the previous one, but with the type invoke.context.Context. However, we can also use the typeModel extensible predicate to describe how to reach invoke.Context from invoke.context.Context:

extensions:
  - addsTo:
      pack: codeql/python-all
      extensible: typeModel
    data:
      - ["invoke.Context", "invoke.context.Context", ""]

Combining this with the sink model we added earlier, the sink in the example is detected by the model.

Example: Taint sources from Django ‘upload_to’ argument

This example is a bit more advanced, involving both a callback function and a class constructor. The Django web framework allows you to specify a function that determines the path where uploaded files are stored (see the Django documentation). This function is passed as an argument to the FileField constructor. The function is called with two arguments: the instance of the model and the filename of the uploaded file. This filename is what we want to mark as a taint source. An example use looks as follows:

from django.db import models

def user_directory_path(instance, filename): # <-- add 'filename' as a taint source
  # file will be uploaded to MEDIA_ROOT/user_<id>/<filename>
  return "user_{0}/{1}".format(instance.user.id, filename)

class MyModel(models.Model):
  upload = models.FileField(upload_to=user_directory_path) # <-- the 'upload_to' parameter defines our custom function

Note that this source is already known by the CodeQL Python analysis, but for this example, you could use the following data extension:

extensions:
  - addsTo:
      pack: codeql/python-all
      extensible: sourceModel
    data:
      - [
          "django.db.models.FileField!",
          "Call.Argument[0,upload_to:].Parameter[1]",
          "remote",
        ]
Example: Adding flow through ‘re.compile’

In this example, we’ll show how to add flow through calls to re.compile. re.compile returns a compiled regular expression for efficient evaluation, but the pattern to be compiled is stored in the pattern attribute of the resulting object.

import re

let y = re.compile(pattern = x); // add value flow from 'x' to 'y.pattern'

Note that this flow is already recognized by the CodeQL Python analysis, but for this example, you could use the following data extension:

extensions:
  - addsTo:
      pack: codeql/python-all
      extensible: summaryModel
    data:
      - [
          "re",
          "Member[compile]",
          "Argument[0,pattern:]",
          "ReturnValue.Attribute[pattern]",
          "value",
        ]
Example: Adding flow through ‘sorted’

In this example, we’ll show how to add flow through calls to the built-in function sorted:

y = sorted(x) # add taint flow from 'x' to 'y'

Note that this flow is already recognized by the CodeQL Python analysis, but for this example, you could use the following data extension:

extensions:
  - addsTo:
      pack: codeql/python-all
      extensible: summaryModel
    data:
      - [
          "builtins",
          "Member[sorted]",
          "Argument[0]",
          "ReturnValue",
          "taint",
        ]

We might also provide a summary stating that the elements of the input list are preserved in the output list:

extensions:
  - addsTo:
      pack: codeql/python-all
      extensible: summaryModel
    data:
      - [
          "builtins",
          "Member[sorted]",
          "Argument[0].ListElement",
          "ReturnValue.ListElement",
          "value",
        ]

The tracking of list elements is imprecise in that the analysis does not know where in the list the tracked value is found. So this summary simply states that if the value is found somewhere in the input list, it will also be found somewhere in the output list, unchanged.

Reference material

The following sections provide reference material for extensible predicates, access paths, types, and kinds.

Extensible predicates sourceModel(type, path, kind)

Adds a new taint source. Most taint-tracking queries will use the new source.

Example:

extensions:
  - addsTo:
      pack: codeql/python-all
      extensible: sourceModel
    data:
      - ["flask", "Member[request]", "remote"]
sinkModel(type, path, kind)

Adds a new taint sink. Sinks are query-specific and will typically affect one or two queries.

Example:

extensions:
  - addsTo:
      pack: codeql/python-all
      extensible: sinkModel
    data:
      - ["builtins", "Member[exec].Argument[0]", "code-injection"]
summaryModel(type, path, input, output, kind)

Adds flow through a function call.

Example:

extensions:
  - addsTo:
      pack: codeql/python-all
      extensible: summaryModel
    data:
      - [
          "builtins",
          "Member[reversed]",
          "Argument[0]",
          "ReturnValue",
          "taint",
        ]
typeModel(type1, type2, path)

A description of how to reach type1 from type2. If this is the only way to reach type1, for instance if type1 is a name we made up to represent the inner workings of a library, we think of this as a definition of type1. In the context of instances, this describes how to obtain an instance of type1 from an instance of type2.

Example:

extensions:
- addsTo:
    pack: codeql/python-all
    extensible: typeModel
  data:
    - [
        "flask.Response",
        "flask",
        "Member[jsonify].ReturnValue",
      ]
Types

A type is a string that identifies a set of values. In each of the extensible predicates mentioned in previous section, the first column is always the name of a type. A type can be defined by adding typeModel tuples for that type. Additionally, the following built-in types are available:

Access paths

The path, input, and output columns consist of a .-separated list of components, which is evaluated from left to right, with each step selecting a new set of values derived from the previous set of values.

The following components are supported:

Additional notes about the syntax of operands:

Kinds Source kinds

See documentation below for Threat models.

Sink kinds

Unlike sources, sinks tend to be highly query-specific, rarely affecting more than one or two queries. Not every query supports customizable sinks. If the following sinks are not suitable for your use case, you should add a new query.

Summary kinds Threat models

Note

Threat models are currently in beta and subject to change. During the beta, threat models are supported only by Java, C#, Python and JavaScript/TypeScript analysis.

A threat model is a named class of dataflow sources that can be enabled or disabled independently. Threat models allow you to control the set of dataflow sources that you want to consider unsafe. For example, one codebase may only consider remote HTTP requests to be tainted, whereas another may also consider data from local files to be unsafe. You can use threat models to ensure that the relevant taint sources are used in a CodeQL analysis.

The kind property of the sourceModel determines which threat model a source is associated with. There are two main categories:

Note that subcategories can be turned included or excluded separately, so you can specify local without database, or just commandargs and environment without the rest of local.

The less commonly used categories are:

When running a CodeQL analysis, the remote threat model is included by default. You can optionally include other threat models as appropriate when using the CodeQL CLI and in GitHub code scanning. For more information, see Analyzing your code with CodeQL queries and Customizing your advanced setup for code scanning.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4