A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://developers.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-evaluate below:

The ML.EVALUATE function | BigQuery

Skip to main content

Stay organized with collections Save and categorize content based on your preferences.

The ML.EVALUATE function

This document describes the ML.EVALUATE function, which lets you evaluate model metrics.

Supported models

You can use the ML.EVALUATE function with all model types except for the following:

Syntax

The ML.EVALUATE function syntax differs depending on the type of model that you use the function with. Choose the option appropriate for your use case.

Times series
ML.EVALUATE(
  MODEL `PROJECT_ID.DATASET.MODEL`
  [, { TABLE `PROJECT_ID.DATASET.TABLE` | (QUERY_STATEMENT) }],
    STRUCT(
      [PERFORM_AGGREGATION AS perform_aggregation]
      [, HORIZON AS horizon]
      [, CONFIDENCE_LEVEL AS confidence_level])
)
Arguments

ML.EVALUATE takes the following arguments:

Note:

For ARIMA_PLUS and ARIMA_PLUS_XREG models, the output columns differ depending on whether the input data is provided or not. If no input data is provided, use ML.ARIMA_EVALUATE instead. The support of ML.EVALUATE without input data is deprecated. Classification & regression
ML.EVALUATE(
  MODEL `PROJECT_ID.DATASET.MODEL`
  [, { TABLE `PROJECT_ID.DATASET.TABLE` | (QUERY_STATEMENT) }],
    STRUCT(
      [THRESHOLD AS threshold]
      [, TRIAL_ID AS trial_id])
)
Arguments

ML.EVALUATE takes the following arguments:

Remote over Gemini
ML.EVALUATE(
  MODEL `PROJECT_ID.DATASET.MODEL`
  [, { TABLE `PROJECT_ID.DATASET.TABLE` | (QUERY_STATEMENT) }],
    STRUCT(
      [TASK_TYPE AS task_type]
      [, MAX_OUTPUT_TOKENS AS max_output_tokens]
      [, TEMPERATURE AS temperature]
      [, TOP_P AS top_k])
)
Arguments

ML.EVALUATE takes the following arguments:

Remote over Claude
ML.EVALUATE(
  MODEL `PROJECT_ID.DATASET.MODEL`
  [, { TABLE `PROJECT_ID.DATASET.TABLE` | (QUERY_STATEMENT) }],
    STRUCT(
      [TASK_TYPE AS task_type]
      [, MAX_OUTPUT_TOKENS AS max_output_tokens]
      [, TOP_K AS top_k]
      [, TOP_P AS top_k])
)
Arguments

ML.EVALUATE takes the following arguments:

Remote over Llama or Mistral AI
ML.EVALUATE(
  MODEL `PROJECT_ID.DATASET.MODEL`
  [, { TABLE `PROJECT_ID.DATASET.TABLE` | (QUERY_STATEMENT) }],
    STRUCT(
      [TASK_TYPE AS task_type]
      [, MAX_OUTPUT_TOKENS AS max_output_tokens]
      [, TEMPERATURE AS temperature]
      [, TOP_P AS top_k])
)
Arguments

ML.EVALUATE takes the following arguments:

Remote over open
ML.EVALUATE(
  MODEL `PROJECT_ID.DATASET.MODEL`
  [, { TABLE `PROJECT_ID.DATASET.TABLE` | (QUERY_STATEMENT) }],
    STRUCT(
      [TASK_TYPE AS task_type]
      [, MAX_OUTPUT_TOKENS AS max_output_tokens]
      [, TEMPERATURE AS temperature]
      [, TOP_K AS top_k]
      [, TOP_P AS top_p])
)
Arguments

ML.EVALUATE takes the following arguments:

All other models
ML.EVALUATE(
  MODEL `PROJECT_ID.DATASET.MODEL`
  [, { TABLE `PROJECT_ID.DATASET.TABLE` | (QUERY_STATEMENT) }],
    STRUCT(
      [THRESHOLD AS threshold]
      [, TRIAL_ID AS trial_id])
)
Arguments

ML.EVALUATE takes the following arguments:

Output

ML.EVALUATE returns a single row of metrics applicable to the type of model specified.

For models that return them, the precision, recall, f1_score, log_loss, and roc_auc metrics are macro-averaged for all of the class labels. For a macro-average, metrics are calculated for each label and then an unweighted average is taken of those values.

Time series

ML.EVALUATE returns the following columns for ARIMA_PLUS or ARIMA_PLUS_XREG models when input data is provided and perform_aggregation is FALSE:

Notes:

The following things are true for time series models when input data is provided and perform_aggregation is FALSE:

ML.EVALUATE returns the following columns for ARIMA_PLUS or ARIMA_PLUS_XREG models when input data is provided and perform_aggregation is TRUE:

Notes:

The following things are true for time series models when input data is provided and perform_aggregation is TRUE:

ML.EVALUATE returns the following columns for an ARIMA_PLUS model when input data isn't provided:

Note: The support of ML.EVALUATE without input data is deprecated. Use ML.ARIMA_EVALUATE instead. Classification

The following types of models are classification models:

ML.EVALUATE returns the following columns for classification models:

Regression

The following types of models are regression models:

ML.EVALUATE returns the following columns for regression models:

K-means

ML.EVALUATE returns the following columns for k-means models:

Matrix factorization

ML.EVALUATE returns the following columns for matrix factorization models with implicit feedback:

ML.EVALUATE returns the following columns for matrix factorization models with explicit feedback:

Remote over pre-trained models

This section describes the output for the following types of models:

ML.EVALUATE returns different columns depending on the task_type value that you specify.

When you specify the TEXT_GENERATION task type, the following columns are returned:

When you specify the CLASSIFICATION task type, the following columns are returned:

When you specify the SUMMARIZATION task type, the following columns are returned:

When you specify the QUESTION_ANSWERING task type, the following columns are returned:

Remote over custom models

ML.EVALUATE returns the following column for remote models over custom models deployed to Vertex AI:

PCA

ML.EVALUATE returns the following column for PCA models:

Autoencoder

ML.EVALUATE returns the following columns for autoencoder models:

Limitations

ML.EVALUATE is subject to the following limitations:

Costs

When used with remote models over Vertex AI LLMs, ML.EVALUATE costs are calculated based on the following:

Examples

The following examples show how to use ML.EVALUATE.

ML.EVALUATE with no input data specified

The following query evaluates a model with no input data specified:

SELECT
  *
FROM
  ML.EVALUATE(MODEL `mydataset.mymodel`)
ML.EVALUATE with a custom threshold and input data

The following query evaluates a model with input data and a custom threshold of 0.55:

SELECT
  *
FROM
  ML.EVALUATE(MODEL `mydataset.mymodel`,
    (
    SELECT
      custom_label,
      column1,
      column2
    FROM
      `mydataset.mytable`),
    STRUCT(0.55 AS threshold))
ML.EVALUATE to calculate forecasting accuracy of a time series

The following query evaluates the 30-point forecasting accuracy for a time series model:

SELECT
  *
FROM
  ML.EVALUATE(MODEL `mydataset.my_arima_model`,
    (
    SELECT
      timeseries_date,
      timeseries_metric
    FROM
      `mydataset.mytable`),
    STRUCT(TRUE AS perform_aggregation, 30 AS horizon))
ML.EVALUATE to calculate ARIMA_PLUS forecasting accuracy for each forecasted timestamp

The following query evaluates the forecasting accuracy for each of the 30 forecasted points of a time series model. It also computes the prediction interval based on a confidence level of 0.9.

SELECT
  *
FROM
  ML.EVALUATE(MODEL `mydataset.my_arima_model`,
    (
    SELECT
      timeseries_date,
      timeseries_metric
    FROM
      `mydataset.mytable`),
    STRUCT(FALSE AS perform_aggregation, 0.9 AS confidence_level,
    30 AS horizon))
ML.EVALUATE to calculate ARIMA_PLUS_XREG forecasting accuracy for each forecasted timestamp

The following query evaluates the forecasting accuracy for each of the 30 forecasted points of a time series model. It also computes the prediction interval based on a confidence level of 0.9. Note that you need to include the side features for the evaluation data.

SELECT
  *
FROM
  ML.EVALUATE(MODEL `mydataset.my_arima_xreg_model`,
    (
    SELECT
      timeseries_date,
      timeseries_metric,
      feature1,
      feature2
    FROM
      `mydataset.mytable`),
    STRUCT(FALSE AS perform_aggregation, 0.9 AS confidence_level,
    30 AS horizon))
ML.EVALUATE to calculate LLM text generation accuracy

The following query evaluates the LLM text generation accuracy for the classification task type for each label from the evaluation table.

SELECT
  *
FROM
  ML.EVALUATE(MODEL `mydataset.my_llm`,
    (
    SELECT
      prompt,
      label
    FROM
      `mydataset.mytable`),
    STRUCT('classification' AS task_type))
What's next

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-08-07 UTC.

[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-07 UTC."],[[["The `ML.EVALUATE` function in BigQuery allows users to evaluate various model metrics for different model types, including regression, classification, k-means, matrix factorization, PCA, time series, autoencoder, and remote models."],["The function supports using either an input table or a query statement to provide data for model evaluation, and if no input data is provided, it will use the metrics generated during model training."],["Depending on the type of model being evaluated, `ML.EVALUATE` can return a variety of metrics such as precision, recall, accuracy, F1 score, log loss, mean absolute error, mean squared error, R2 score, and more, tailored to each model's specific requirements."],["For remote models over Vertex AI, the `ML.EVALUATE` function can assess text generation, classification, summarization, and question-answering tasks, returning metrics like BLEU4 score, ROUGE-L scores, exact match, and evaluation status."],["There are specific considerations for using `ML.EVALUATE` with remote models, such as limitations on supported model types and the need for specific column names (like `input_text` and `output_text`) in input tables or queries."]]],[]]


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4