A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from http://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-predict below:

The ML.PREDICT function | BigQuery

Stay organized with collections Save and categorize content based on your preferences.

The ML.PREDICT function

This document describes the ML.PREDICT function, which you can use to predict outcomes by using a model. ML.PREDICT works with the following models:

For PCA and autoencoder models, you can use the ML.GENERATE_EMBEDDING function as an alternative to the ML.PREDICT function. ML.GENERATE_EMBEDDING generates the same embedding data as ML.PREDICT as an array in a single column, rather than in a series of columns. Having all of the embeddings in a single column lets you directly use the VECTOR_SEARCH function on theML.GENERATE_EMBEDDING output.

You can run prediction during model creation, after model creation, or after a failure (as long as at least one iteration is finished). ML.PREDICT always uses the model weights from the last successful iteration.

Syntax
ML.PREDICT(
  MODEL `PROJECT_ID.DATASET.MODEL_NAME`,
  { TABLE `PROJECT_ID.DATASET.TABLE` | (QUERY_STATEMENT) }
  STRUCT(
    [THRESHOLD AS threshold]
    [, KEEP_ORIGINAL_COLUMNS AS keep_original_columns]
    [, TRIAL_ID AS trial_id])
)
Arguments

ML.PREDICT takes the following arguments:

Output

The output of the ML.PREDICT function has as many rows as the input table, and it includes all columns from the input table and all output columns from the model. The output column names for the model are predicted_<label_column_name> and, for classification models, predicted_<label_column_name>_probs. In both columns, label_column_name is the name of the input label column that's used during training.

Regression models

For the following types of regression models:

The following column is returned:

Classification models

For the following types of binary-class classification models:

The following columns are returned:

For the following types of multiclass classification models:

The following columns are returned:

K-means models

For k-means models, the following columns are returned:

PCA models

For PCA models, the following columns are returned:

The original input columns are appended if the keep_original_columns argument is set to TRUE.

Autoencoder models

For autoencoder models, the following columns are returned:

The original input columns are appended after the latent space columns.

Imported models

For TensorFlow Lite models, the output is the output of the TensorFlow Lite model's predict method.

For ONNX models, the output is the output of the ONNX model's predict method.

For XGBoost models, the output is the output of the XGBoost model's predict method.

Remote models

For remote models, the output columns contain all Vertex AI endpoint output fields, and also a remote_model_status field that contains status messages from Vertex AI endpoint.

Missing data imputation

In statistics, imputation is used to replace missing data with substituted values. When you train a model in BigQuery ML, NULL values are treated as missing data. When you predict outcomes in BigQuery ML, missing values can occur when BigQuery ML encounters a NULL value or a previously unseen value. BigQuery ML handles missing data differently, based on the type of data in the column.

Column type Imputation method Numeric In both training and prediction, NULL values in numeric columns are replaced with the mean value of the given column, as calculated by the feature column in the original input data. One-hot/Multi-hot encoded In both training and prediction, NULL values in the encoded columns are mapped to an additional category that is added to the data. Previously unseen data is assigned a weight of 0 during prediction. TIMESTAMP TIMESTAMP columns use a mixture of imputation methods from both standardized and one-hot encoded columns. For the generated Unix time column, BigQuery ML replaces values with the mean Unix time across the original columns. For other generated values, BigQuery ML assigns them to the respective NULL category for each extracted feature. STRUCT In both training and prediction, each field of the STRUCT is imputed according to its type. Permissions

You must have the bigquery.models.getData Identity and Access Management (IAM) permission in order to run ML.PREDICT.

Examples

The following examples assume your model and input table are in your default project.

Predict an outcome

The following example predicts an outcome and returns the following columns:

SELECT
  *
FROM
  ML.PREDICT(MODEL `mydataset.mymodel`,
    (
    SELECT
      label,
      column1,
      column2
    FROM
      `mydataset.mytable`))
Compare predictions from two different models

The following example creates two models and then compares their output:

  1. Create the first model:

    CREATE MODEL
      `mydataset.mymodel1`
    OPTIONS
      (model_type='linear_reg',
        input_label_cols=['label'],
      ) AS
    SELECT
      label,
      input_column1
    FROM
      `mydataset.mytable`
  2. Create the second model:

    CREATE MODEL
      `mydataset.mymodel2`
    OPTIONS
      (model_type='linear_reg',
        input_label_cols=['label'],
      ) AS
    SELECT
      label,
      input_column2
    FROM
      `mydataset.mytable`
  3. Compare the output of the two models:

    SELECT
      label,
      predicted_label1,
      predicted_label AS predicted_label2
    FROM
      ML.PREDICT(MODEL `mydataset.mymodel2`,
        (
        SELECT
          * EXCEPT (predicted_label),
              predicted_label AS predicted_label1
        FROM
          ML.PREDICT(MODEL `mydataset.mymodel1`,
            TABLE `mydataset.mytable`)))
Specify a custom threshold

The following example runs prediction with input data and a custom threshold of 0.55:

SELECT
  *
FROM
  ML.PREDICT(MODEL `mydataset.mymodel`,
    (
    SELECT
      custom_label,
      column1,
      column2
    FROM
      `mydataset.mytable`),
    STRUCT(0.55 AS threshold))
Predict an outcome from structured data with an imported TensorFlow model

The following query predicts outcomes using an imported TensorFlow model. The input_data table contains inputs in the schema expected by my_model. See the CREATE MODEL statement for TensorFlow models for more information.

SELECT *
FROM ML.PREDICT(MODEL `my_project.my_dataset.my_model`,
  (SELECT * FROM input_data))
Predict an outcome from image data with an imported TensorFlow model

If you are running inference on image data from an object table, you must use the ML.DECODE_IMAGE function to convert image bytes to a multi-dimensional ARRAY representation. You can use ML.DECODE_IMAGE output directly in an ML.PREDICT function, or you can write the results from ML.DECODE_IMAGE to a table column and reference that column when you call ML.PREDICT. You can also pass ML.DECODE_IMAGE output to another image processing function for additional preprocessing during either of these procedures.

You can join the object table to standard BigQuery tables to limit the data used in inference, or to provide additional input to the model.

The following examples show different ways you can use the ML.PREDICT function with image data.

Example 1

The following example uses the ML.DECODE_IMAGE function directly in the ML.PREDICT function. It returns the inference results for all images in the object table, for a model with an input field of input and an output field of feature:

SELECT * FROM
ML.PREDICT(
  MODEL `my_dataset.vision_model`,
  (SELECT uri, ML.RESIZE_IMAGE(ML.DECODE_IMAGE(data), 480, 480, FALSE) AS input
  FROM `my_dataset.object_table`)
);

Example 2

The following example uses the ML.DECODE_IMAGE function directly in the ML.PREDICT function, and uses the ML.CONVERT_COLOR_SPACE function in the ML.PREDICT function to convert the image color space from RBG to YIQ. It also shows how to use object table fields to filter the objects included in inference. It returns the inference results for all JPG images in the object table, for a model with an input field of input and an output field of feature:

SELECT * FROM
  ML.PREDICT(
    MODEL `my_dataset.vision_model`,
    (SELECT uri, ML.CONVERT_COLOR_SPACE(ML.RESIZE_IMAGE(ML.DECODE_IMAGE(data), 224, 280, TRUE), 'YIQ') AS input
    FROM `my_dataset.object_table`
    WHERE content_type = 'image/jpeg')
  );

Example 3

The following example uses results from ML.DECODE_IMAGE that have been written to a table column but not processed any further. It uses ML.RESIZE_IMAGE and ML.CONVERT_IMAGE_TYPE in the ML.PREDICT function to process the image data. It returns the inference results for all images in the decoded images table, for a model with an input field of input and an output field of feature.

Create the decoded images table:

CREATE OR REPLACE TABLE `my_dataset.decoded_images`
  AS (SELECT ML.DECODE_IMAGE(data) AS decoded_image
  FROM `my_dataset.object_table`);

Run inference on the decoded images table:

SELECT * FROM
ML.PREDICT(
  MODEL`my_dataset.vision_model`,
  (SELECT uri, ML.CONVERT_IMAGE_TYPE(ML.RESIZE_IMAGE(decoded_image, 480, 480, FALSE)) AS input
  FROM `my_dataset.decoded_images`)
);

Example 4

The following example uses results from ML.DECODE_IMAGE that have been written to a table column and preprocessed using ML.RESIZE_IMAGE. It returns the inference results for all images in the decoded images table, for a model with an input field of input and an output field of feature.

Create the table:

CREATE OR REPLACE TABLE `my_dataset.decoded_images`
  AS (SELECT ML.RESIZE_IMAGE(ML.DECODE_IMAGE(data) 480, 480, FALSE) AS decoded_image
  FROM `my_dataset.object_table`);

Run inference on the decoded images table:

SELECT * FROM
ML.PREDICT(
  MODEL `my_dataset.vision_model`,
  (SELECT uri, decoded_image AS input
  FROM `my_dataset.decoded_images`)
);

Example 5

The following example uses the ML.DECODE_IMAGE function directly in the ML.PREDICT function. In this example, the model has an output field of embeddings and two input fields: one that expects an image, f_img, and one that expects a string, f_txt. The image input comes from the object table and the string input comes from a standard BigQuery table that is joined with the object table by using the uri column.

SELECT * FROM
  ML.PREDICT(
    MODEL `my_dataset.mixed_model`,
    (SELECT uri, ML.RESIZE_IMAGE(ML.DECODE_IMAGE(my_dataset.my_object_table.data), 224, 224, FALSE) AS f_img,
      my_dataset.image_description.description AS f_txt
    FROM `my_dataset.object_table`
    JOIN `my_dataset.image_description`
    ON object_table.uri = image_description.uri)
  );
Predict an outcome with a model trained with the TRANSFORM clause

The following example trains a model using the TRANSFORM clause:

CREATE MODEL `mydataset.mymodel`
  TRANSFORM(f1 + f2 as c, label)
  OPTIONS(...)
AS SELECT f1, f2, f3, label FROM t;

Because the f3 column doesn't appear in the TRANSFORM clause, the following prediction query omits that column in the QUERY_STATEMENT:

SELECT * FROM ML.PREDICT(
  MODEL `mydataset.mymodel`, (SELECT f1, f2 FROM t1));

If f3 is provided in the SELECT statement, it isn't used for calculating predictions but is instead passed through for use in the rest of the SQL statement.

Predict dimensionality reduction results (latent space) with an autoencoder model

The following example runs prediction against a previously built autoencoder model, where the input was 4 dimensional (4 input columns) and the dimensionality reduction had 2 dimensions (2 output columns):

SELECT * FROM ML.PREDICT(
  MODEL `mydataset.mymodel`, (SELECT f1, f2, f3, f4 FROM t1));
What's next

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4