A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from http://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-detect-anomalies below:

The ML.DETECT_ANOMALIES function | BigQuery

Stay organized with collections Save and categorize content based on your preferences.

The ML.DETECT_ANOMALIES function

This document describes the ML.DETECT_ANOMALIES function, which lets you perform anomaly detection in BigQuery ML.

You can use the following types of models with ML.DETECT_ANOMALIES, depending on the type of input data you want to analyze:

Syntax
# ARIMA_PLUS and ARIMA_PLUS_XREG models:
ML.DETECT_ANOMALIES(
  MODEL `PROJECT_ID.DATASET.MODEL_NAME`
  [, STRUCT(ANOMALY_PROB_THRESHOLD AS anomaly_prob_threshold)]
  [, { TABLE `PROJECT_ID.DATASET.TABLE` | (QUERY_STATEMENT) }]
)

#Autoencoder, k-means, or PCA models:
ML.DETECT_ANOMALIES(
  MODEL `PROJECT_ID.DATASET.MODEL_NAME`,
  STRUCT(CONTAMINATION AS contamination),
  { TABLE `PROJECT_ID.DATASET.TABLE` | (QUERY_STATEMENT) }
)
Arguments

ML.DETECT_ANOMALIES takes the following arguments:

Input

The input requirements for the ML.DETECT_ANOMALIES function depend upon the input model type.

Time series model input

Anomaly detection with ARIMA_PLUS and ARIMA_PLUS_XREG models has the following requirements:

For a list of supported data types, see TIME_SERIES_TIMESTAMP_COL and TIME_SERIES_DATA_COL.

Autoencoder, k-means, or PCA model input

Anomaly detection with autoencoder, k-means, or PCA models has the following requirements:

Output

ML.DETECT_ANOMALIES always returns the is_anomaly column that contains the anomaly detection results. Other output columns differ based upon the input model type and input data table.

Time series model output

ARIMA_PLUS and ARIMA_PLUS_XREG model output includes the following columns, followed by the input table columns, if present. Output can include the following:

ML.DETECT_ANOMALIES output for time series models has the following properties:

Autoencoder and PCA model output

Autoencoder and PCA model output includes the following columns, followed by the input table columns:

K-means model output

K-means model output includes the following, followed by the input table columns:

Examples

The following examples show how to use ML.DETECT_ANOMALIES with different input models and settings.

ARIMA_PLUS model without specified settings

The following example detects anomalies using an ARIMA_PLUS model that has the DECOMPOSE_TIME_SERIES training option set to its default value of TRUE, without specifying the anomaly_prob_threshold argument.

SELECT
  *
FROM
  ML.DETECT_ANOMALIES(MODEL `mydataset.my_arima_plus_model`)

If the time series input column names are ts_timestamp and ts_data, then this query returns results similar to the following:

+-------------------------+----------+------------+-------------+-------------+---------------------+
|      ts_timestamp       | ts_data  | is_anomaly | lower_bound | upper_bound | anomaly_probability |
+-------------------------+----------+------------+-------------+-------------+---------------------+
| 2021-01-01 00:00:01 UTC |  125.3   |   FALSE    |  123.5      |  139.1      |  0.93               |
| 2021-01-02 00:00:01 UTC |  145.3   |   TRUE     |  128.5      |  143.1      |  0.96               |
+-------------------------+----------+------------+-------------+-------------+---------------------+
ARIMA_PLUS model with a custom anomaly_prob_threshold value

The following example detects anomalies using an ARIMA_PLUS model that has the DECOMPOSE_TIME_SERIES training option set to its default value of TRUE, using a custom anomaly_prob_threshold value of 0.8:

SELECT
  *
FROM
  ML.DETECT_ANOMALIES(MODEL `mydataset.my_arima_plus_model`,
    STRUCT(0.8 AS anomaly_prob_threshold))

If the time series input column names are ts_timestamp and ts_data, then this query returns results similar to the following:

+-------------------------+----------+------------+-------------+-------------+---------------------+
|      ts_timestamp       | ts_data  | is_anomaly | lower_bound | upper_bound | anomaly_probability |
+-------------------------+----------+------------+-------------+-------------+---------------------+
| 2021-01-01 00:00:01 UTC |  125.3   |    TRUE    |  129.5      |  133.6      |  0.93               |
| 2021-01-02 00:00:01 UTC |  145.3   |    TRUE    |  131.5      |  136.6      |  0.96               |
+-------------------------+----------+------------+-------------+-------------+---------------------+
ARIMA_PLUS model with input data as a query statement

The following example detects anomalies using an ARIMA_PLUS model, using a custom anomaly_prob_threshold value of 0.9 and passing an input data table into the query:

SELECT
  *
FROM
  ML.DETECT_ANOMALIES(MODEL `mydataset.my_arima_plus_model`,
    STRUCT(0.9 AS anomaly_prob_threshold),
    (
      SELECT
        state, city, date, temperature, weather
      FROM
        `mydataset.my_time_series_data_table`))

This example uses the following column values:

This example returns results similar to the following:

+-------+------------+-------------------------+-------------+------------+-------------+-------------+---------------------+---------+
| state |   city     |           date          | temperature | is_anomaly | lower_bound | upper_bound | anomaly_probability | weather |
+-------+------------+-------------------------+-------------+------------+-------------+-------------+---------------------+---------+
| "WA"  | "Kirkland" | 2021-01-01 00:00:00 UTC |   38.1      |   FALSE    |     36.4    |    42.0     |        0.8293       | "sunny" |
| "WA"  | "Kirkland" | 2021-01-02 00:00:00 UTC |   37.1      |   TRUE     |     37.4    |    43.3     |        0.9124       | "rainy" |
+-------+------------+-------------------------+-------------+------------+-------------+-------------+---------------------+---------+
ARIMA_PLUS model with input data as a table

The following example detects anomalies using an ARIMA_PLUS model, using a custom anomaly_prob_threshold value of 0.9 and passing an input data table into the query:

SELECT
  *
FROM
  ML.DETECT_ANOMALIES(MODEL `mydataset.my_arima_plus_model`,
    STRUCT(0.9 AS anomaly_prob_threshold),
    TABLE `mydataset.my_time_series_data_table`)

If the TIME_SERIES_ID_COL column names are state, city, and TIME_SERIES_TIMESTAMP_COL, and the TIME_SERIES_DATA_COL column names are date and temperature, and one additional column weather is in the input data table, then this query returns results similar to the following:

+-------+------------+-------------------------+-------------+------------+-------------+-------------+---------------------+---------+
| state |   city     |           date          | temperature | is_anomaly | lower_bound | upper_bound | anomaly_probability | weather |
+-------+------------+-------------------------+-------------+------------+-------------+-------------+---------------------+---------+
| "WA"  | "Kirkland" | 2021-01-01 00:00:00 UTC |   38.1      |   FALSE    |     36.4    |    42.0     |        0.8293       | "sunny" |
| "WA"  | "Kirkland" | 2021-01-02 00:00:00 UTC |   37.1      |   TRUE     |     37.4    |    43.3     |        0.9124       | "rainy" |
+-------+------------+-------------------------+-------------+------------+-------------+-------------+---------------------+---------+
ARIMA_PLUS_XREG model with a custom anomaly_prob_threshold value

The following example detects anomalies using an ARIMA_PLUS_XREG model that uses a custom anomaly_prob_threshold value of 0.6:

SELECT
  *
FROM
  ML.DETECT_ANOMALIES (
   MODEL `mydataset.my arima_plus_xreg_model`,
   STRUCT(0.6 AS anomaly_prob_threshold)
  )
ORDER BY
  date ASC;

If the time series input column names are date and temperature, then this query returns results similar to the following:

+-------------------------+-------------+------------+---------------------+---------------------+----------------------+
|      date               | temperature | is_anomaly | lower_bound         | upper_bound         | anomaly_probability  |
+-------------------------+-------------+------------+---------------------+---------------------+----------------------+
| 2009-08-11 00:00:00 UTC |  70.1       |    false   |  67.65879917809896  |  72.541200821901029 |  0.0                 |
| 2009-08-12 00:00:00 UTC |  73.4       |    false   |  71.714971312549849 |  76.597372956351919 |  0.20573021642489953 |
| 2009-08-13 00:00:00 UTC |  64.6       |    true    |  67.7428898975034   |  72.625291541305472 |  0.94632610424009034 |
+-------------------------+-------------+------------+---------------------+---------------------+----------------------+
Autoencoder model

The following example detects anomalies using an autoencoder model and a contamination value of 0.1.

SELECT
  *
FROM
  ML.DETECT_ANOMALIES(MODEL `mydataset.my_autoencoder_model`,
    STRUCT(0.1 AS contamination),
    TABLE `mydataset.mytable`)

If the feature column names are f1 and f2, then this query returns results similar to the following:

+------------+--------------------+---------+--------+
| is_anomaly | mean_squared_error |    f1   |   f2   |
+------------+--------------------+---------+--------+
|   FALSE    |     0.63456        |   120   |  "a"   |
|   TRUE     |     11.342         |  15000  |  "b"   |
+------------+--------------------+---------+--------+
K-means model

The following example detects anomalies using a k-means model and a contamination value of 0.2.

SELECT
  *
FROM
  ML.DETECT_ANOMALIES(MODEL `mydataset.my_kmeans_model`,
    STRUCT(0.2 AS contamination),
    (
      SELECT
        f1,
        f2
      FROM
        `mydataset.mytable`))

This query returns results similar to the following:

+------------+---------------------+-------------+--------+--------+
| is_anomaly | normalized_distance | centroid_id |   f1   |   f2   |
+------------+---------------------+-------------+--------+--------+
|   FALSE    |     0.63456         |     1       |  120   |  "a"   |
|   TRUE     |     6.3243          |     2       | 15000  |  "b"   |
+------------+---------------------+-------------+--------+--------+
PCA model

The following example detects anomalies using a PCA model and a contamination value of 0.1.

SELECT
  *
FROM
  ML.DETECT_ANOMALIES(MODEL `mydataset.my_pca_model`,
    STRUCT(0.1 AS contamination),
    TABLE `mydataset.mytable`)

If the feature column names are f1, f2 and f3, then this query returns results similar to the following:

+------------+--------------------+---------+--------+------+
| is_anomaly | mean_squared_error |    f1   |   f2   |  f3  |
+------------+--------------------+---------+--------+------+
|   FALSE    |     0.63456        |   120   |  "a"   |  0.9 |
|   TRUE     |     11.342         |  15000  |  "b"   |  25  |
+------------+--------------------+---------+--------+------+
Pricing

All queries that use the ML.DETECT_ANOMALIES function are billable, regardless of the pricing model.

What's next

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-08-07 UTC.

[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-07 UTC."],[[["The `ML.DETECT_ANOMALIES` function in BigQuery ML enables anomaly detection for various model types, including time series (`ARIMA_PLUS`, `ARIMA_PLUS_XREG`) and independent and identically distributed (IID) data (`K-means`, `PCA`, `Autoencoder`)."],["The syntax for `ML.DETECT_ANOMALIES` differs slightly depending on the model type, with time series models allowing an `anomaly_prob_threshold` and IID models requiring a `contamination` value to specify the anomaly detection parameters."],["Input requirements for `ML.DETECT_ANOMALIES` vary by model type, including column name matching and specific training option requirements, such as `DECOMPOSE_TIME_SERIES` set to true for time-series models."],["The output of `ML.DETECT_ANOMALIES` always includes an `is_anomaly` column, with additional columns like `anomaly_probability`, `lower_bound`, `upper_bound` for time series models, and `mean_squared_error` or `normalized_distance` for IID models."],["`ML.DETECT_ANOMALIES` is billable for all usage, and various examples are demonstrated for how to use this function with `ARIMA_PLUS`, `ARIMA_PLUS_XREG`, `Autoencoder`, `K-means`, and `PCA` models, including different thresholds and input methods."]]],[]]


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4