Stay organized with collections Save and categorize content based on your preferences.
The CREATE MODEL statement for AutoML modelsThis document describes the CREATE MODEL
statement for creating AutoML classification and regression models in BigQuery by using SQL. Alternatively, you can use the Google Cloud console user interface to create a model by using a UI (Preview) instead of constructing the SQL statement yourself. AutoML lets you quickly build large-scale machine learning models on tabular data.
You can use AutoML regressor models with the ML.PREDICT
function to perform regression, and you can use AutoML classifier models with the ML.PREDICT
function to perform classification. You can use both types of AutoML models with the ML.PREDICT
function to perform anomaly detection.
BigQuery ML uses the default values for AutoML training options, including data splitting and optimization functions.
For information about the supported SQL statements and functions for each model type, see End-to-end user journey for each model.
CREATE MODEL
syntax
{CREATE MODEL | CREATE MODEL IF NOT EXISTS | CREATE OR REPLACE MODEL} model_name OPTIONS(model_option_list) AS query_statement model_option_list: MODEL_TYPE = { 'AUTOML_REGRESSOR' | 'AUTOML_CLASSIFIER' } [, BUDGET_HOURS = float64_value ] [, OPTIMIZATION_OBJECTIVE = { string_value | struct_value } ] [, INPUT_LABEL_COLS = string_array ] [, DATA_SPLIT_COL = string_value ] [, KMS_KEY_NAME = string_value ]
CREATE MODEL
Creates and trains a new model in the specified dataset. If the model name exists, CREATE MODEL
returns an error.
CREATE MODEL IF NOT EXISTS
Creates and trains a new model only if the model doesn't exist in the specified dataset.
CREATE OR REPLACE MODEL
Creates and trains a model and replaces an existing model with the same name in the specified dataset.
model_name
The name of the model you're creating or replacing. The model name must be unique in the dataset: no other model or table can have the same name. The model name must follow the same naming rules as a BigQuery table. A model name can:
model_name
is not case-sensitive.
If you don't have a default project configured, then you must prepend the project ID to the model name in the following format, including backticks:
`[PROJECT_ID].[DATASET].[MODEL]`
For example, `myproject.mydataset.mymodel`.
MODEL_TYPE
Syntax
MODEL_TYPE = { 'AUTOML_REGRESSOR' | 'AUTOML_CLASSIFIER' }
Description
Specifies the model type. This option is required.
Arguments
This option accepts the following values:
AUTOML_REGRESSOR
: This creates a regression model that uses a label column with a numeric data type.AUTOML_CLASSIFIER
: This creates a classification model that uses a label column with either a string or a numeric data type.BUDGET_HOURS
Syntax
BUDGET_HOURS = float64_value
Description
Sets the training budget in hours for AutoML training.
After training an AutoML model, BigQuery ML compresses the model to ensure it is small enough to import, which can take up to 50% of the training time. The time to compress the model is not included in the training budget time.
Arguments
A FLOAT64
value between 1.0
and 72.0
. The default value is 1.0
.
OPTIMIZATION_OBJECTIVE
Syntax
OPTIMIZATION_OBJECTIVE = { string_value | struct_value }
Description
Sets the optimization objective function to use for AutoML training.
For more details on the optimization objective functions, see the AutoML documentation.
Arguments
This option can be specified as a STRING
or STRUCT
value.
This option accepts the following string values for optimization objective functions:
MINIMIZE_RMSE
(default)MINIMIZE_MAE
MINIMIZE_RMSLE
MAXIMIZE_AU_ROC
(default)MINIMIZE_LOG_LOSS
MAXIMIZE_AU_PRC
MAXIMIZE_PRECISION_AT_RECALL
MAXIMIZE_RECALL_AT_PRECISION
MINIMIZE_LOG_LOSS
For example:
OPTIMIZATION_OBJECTIVE = 'MAXIMIZE_AU_ROC'
For binary classification models, you can alternatively specify a struct value for this option. The struct must contain a STRING
value and a FLOAT64
value in one of the following combinations:
The string value is MAXIMIZE_PRECISION_AT_RECALL
and the float value specifies the fixed recall value, which must be in the range of [0,1]
.
The string value is MAXIMIZE_RECALL_AT_PRECISION
and the float value specifies the fixed precision value, which must be in the range of [0,1]
.
For example:
OPTIMIZATION_OBJECTIVE = STRUCT('MAXIMIZE_PRECISION_AT_RECALL', 0.3)
INPUT_LABEL_COLS
Syntax
INPUT_LABEL_COLS = string_array
Description
The name of the label column in the training data. The label column contains the expected machine learning result for the given record. For example, in a spam detection dataset, the label column value would probably be either spam
or not spam
. In a rainfall dataset, the label column value might be the amount of rain that fell during a certain period.
Arguments
A one-element ARRAY
of string values. Defaults to label
.
Supported data types for input_label_cols
include the following:
DATA_SPLIT_COL
Syntax
DATA_SPLIT_COL = string_value
Description
The name of the column to use to sort input data into the training, validation, or test set. Defaults to random splitting.
Arguments
The string value must be the name of one of the columns in the training data. This column must have either a timestamp or string data type. This column is passed directly to AutoML.
If you use a string column, rows are assigned to the appropriate dataset based on the column's value, which must be one of the following options:
TRAIN
VALIDATE
TEST
UNASSIGNED
For more information about how to use these values, see Manual split.
Timestamp columns are used to perform a chronological split.
KMS_KEY_NAME
Syntax
KMS_KEY_NAME = string_value
Description
The Cloud Key Management Service customer-managed encryption key (CMEK) to use to encrypt the model.
Arguments
A STRING
value containing the fully-qualified name of the CMEK. For example,
'projects/my_project/locations/my_location/keyRings/my_ring/cryptoKeys/my_key'
Supported data types for input columns
For columns other than the label column, any groupable data type is supported. The BigQuery column type is used to determine the feature column type in AutoML.
To force a numeric column to be treated as categorical, use the CAST
function to cast it to a BigQuery string. Arrays of supported types are allowed and remain arrays during AutoML training.
For information about supported locations, see Locations for non-remote models.
LimitationsAutoML models have the following limitations:
Global
region customer-managed encryption keys (CMEKs) and multi-region CMEKs, for example eu
or us
, are not supported.Too many AutoML training queries have been issued within a short period of time
, you can submit a request to raise the maximum number of concurrent training jobs. To request an increase, email bqml-feedback@google.com with your project ID and the details of your request.AUTOML_CLASSIFIER
models, the label
column can contain up to 1,000 unique values; that is, the number of classes is less than or equal to 1,000. If you need to classify into more than 1,000 labels, contact bqml-feedback@google.com.CREATE MODEL
example
The following example creates a model named mymodel
in mydataset
in your default project. It uses the public nyc-tlc.yellow.trips
taxi trip data available in BigQuery. The job takes approximately 3 hours to complete, including training, model compression, temporary data movement (to AutoML), and setup tasks.
Create the model:
CREATE OR REPLACE MODEL `project_id.mydataset.mymodel` OPTIONS(model_type='AUTOML_REGRESSOR', input_label_cols=['fare_amount'], budget_hours=1.0) AS SELECT (tolls_amount + fare_amount) AS fare_amount, pickup_longitude, pickup_latitude, dropoff_longitude, dropoff_latitude, passenger_count FROM `nyc-tlc.yellow.trips` WHERE ABS(MOD(FARM_FINGERPRINT(CAST(pickup_datetime AS STRING)), 100000)) = 1 AND trip_distance > 0 AND fare_amount >= 2.5 AND fare_amount <= 100.0 AND pickup_longitude > -78 AND pickup_longitude < -70 AND dropoff_longitude > -78 AND dropoff_longitude < -70 AND pickup_latitude > 37 AND pickup_latitude < 45 AND dropoff_latitude > 37 AND dropoff_latitude < 45 AND passenger_count > 0
Run predictions:
SELECT * FROM ML.PREDICT(MODEL `project_id.mydataset.mymodel`, ( SELECT * FROM `nyc-tlc.yellow.trips` LIMIT 100))
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-08-11 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-11 UTC."],[[["The `CREATE MODEL` statement in BigQuery is used to build AutoML classification and regression models on tabular data for large-scale machine learning."],["AutoML models, created using `CREATE MODEL`, can be either `AUTOML_REGRESSOR` for regression tasks or `AUTOML_CLASSIFIER` for classification, both compatible with the `ML.PREDICT` function for various predictions, including anomaly detection."],["The `CREATE MODEL` syntax includes several options like `MODEL_TYPE`, `BUDGET_HOURS`, and `OPTIMIZATION_OBJECTIVE`, allowing customization of the training process, such as setting training time and choosing optimization goals."],["There are specific limitations to using AutoML models, including input data size constraints, and restrictions on CMEK regions."],["The example provided demonstrates the creation of an `AUTOML_REGRESSOR` model for predicting taxi fares, detailing how to specify options like `model_type`, `input_label_cols`, and `budget_hours` within the `CREATE MODEL` statement."]]],[]]
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4