Stay organized with collections Save and categorize content based on your preferences.
The CREATE MODEL statementTo create a model in BigQuery, use the BigQuery ML CREATE MODEL
statement. This statement is similar to the CREATE TABLE
DDL statement. When you run a query that contains a CREATE MODEL
statement, a query job is generated for you that processes the query. You can also use the Google Cloud console user interface to create a model by using a UI (Preview).
For information about supported model types of each SQL statement and function, and all supported SQL statements and functions for each model type, read End-to-end user journey for each model.
Required permissionsTo create a dataset to store the model, you need the bigquery.datasets.create
IAM permission.
To create a model, you need the following permissions:
bigquery.jobs.create
bigquery.models.create
bigquery.models.getData
bigquery.models.updateData
bigquery.connections.delegate
(for remote models)The following predefined IAM roles grant these permissions:
For more information about IAM roles and permissions in BigQuery, see Introduction to IAM.
CREATE MODEL
syntax Note: This syntax statement provides a comprehensive list of model types with their model options. When you create a model, use that model specific CREATE MODEL
statement for convenience. You can view specific CREATE MODEL
statements by clicking the MODEL_TYPE
name in the following list, in the table of contents in the left panel, or in the create model link in the End-to-end user journey for each model.
{CREATE MODEL | CREATE MODEL IF NOT EXISTS | CREATE OR REPLACE MODEL} model_name [TRANSFORM (select_list)] [INPUT (field_name field_type) OUTPUT (field_name field_type)] [REMOTE WITH CONNECTION {`connection_name` | DEFAULT}] [OPTIONS(model_option_list)] [AS {query_statement | ( training_data AS (query_statement), custom_holiday AS (holiday_statement) )}] model_option_list: MODEL_TYPE = { 'LINEAR_REG' | 'LOGISTIC_REG' | 'KMEANS' | 'MATRIX_FACTORIZATION' | 'PCA' | 'AUTOENCODER' | 'AUTOML_CLASSIFIER' | 'AUTOML_REGRESSOR' | 'BOOSTED_TREE_CLASSIFIER' | 'BOOSTED_TREE_REGRESSOR' | 'RANDOM_FOREST_CLASSIFIER' | 'RANDOM_FOREST_REGRESSOR' | 'DNN_CLASSIFIER' | 'DNN_REGRESSOR' | 'DNN_LINEAR_COMBINED_CLASSIFIER' | 'DNN_LINEAR_COMBINED_REGRESSOR' | 'ARIMA_PLUS' | 'ARIMA_PLUS_XREG' | 'TENSORFLOW' | 'TENSORFLOW_LITE' | 'ONNX' | 'XGBOOST' | 'CONTRIBUTION_ANALYSIS'} [, MODEL_REGISTRY = { 'VERTEX_AI' } ] [, VERTEX_AI_MODEL_ID = string_value ] [, VERTEX_AI_MODEL_VERSION_ALIASES = string_array ] [, INPUT_LABEL_COLS = string_array ] [, MAX_ITERATIONS = int64_value ] [, EARLY_STOP = { TRUE | FALSE } ] [, MIN_REL_PROGRESS = float64_value ] [, DATA_SPLIT_METHOD = { 'AUTO_SPLIT' | 'RANDOM' | 'CUSTOM' | 'SEQ' | 'NO_SPLIT' } ] [, DATA_SPLIT_EVAL_FRACTION = float64_value ] [, DATA_SPLIT_TEST_FRACTION = float64_value ] [, DATA_SPLIT_COL = string_value ] [, OPTIMIZE_STRATEGY = { 'AUTO_STRATEGY' | 'BATCH_GRADIENT_DESCENT' | 'NORMAL_EQUATION' } ] [, L1_REG = float64_value ] [, L2_REG = float64_value ] [, LEARN_RATE_STRATEGY = { 'LINE_SEARCH' | 'CONSTANT' } ] [, LEARN_RATE = float64_value ] [, LS_INIT_LEARN_RATE = float64_value ] [, WARM_START = { TRUE | FALSE } ] [, AUTO_CLASS_WEIGHTS = { TRUE | FALSE } ] [, CLASS_WEIGHTS = struct_array ] [, INSTANCE_WEIGHT_COL = string_value ] [, NUM_CLUSTERS = int64_value ] [, KMEANS_INIT_METHOD = { 'RANDOM' | 'KMEANS++' | 'CUSTOM' } ] [, KMEANS_INIT_COL = string_value ] [, DISTANCE_TYPE = { 'EUCLIDEAN' | 'COSINE' } ] [, STANDARDIZE_FEATURES = { TRUE | FALSE } ] [, MODEL_PATH = string_value ] [, BUDGET_HOURS = float64_value ] [, OPTIMIZATION_OBJECTIVE = { string_value | struct_value } ] [, FEEDBACK_TYPE = {'EXPLICIT' | 'IMPLICIT'} ] [, NUM_FACTORS = int64_value ] [, USER_COL = string_value ] [, ITEM_COL = string_value ] [, RATING_COL = string_value ] [, WALS_ALPHA = float64_value ] [, BOOSTER_TYPE = { 'gbtree' | 'dart'} ] [, NUM_PARALLEL_TREE = int64_value ] [, DART_NORMALIZE_TYPE = { 'tree' | 'forest'} ] [, TREE_METHOD = { 'auto' | 'exact' | 'approx' | 'hist'} ] [, MIN_TREE_CHILD_WEIGHT = float64_value ] [, COLSAMPLE_BYTREE = float64_value ] [, COLSAMPLE_BYLEVEL = float64_value ] [, COLSAMPLE_BYNODE = float64_value ] [, MIN_SPLIT_LOSS = float64_value ] [, MAX_TREE_DEPTH = int64_value ] [, SUBSAMPLE = float64_value ] [, ACTIVATION_FN = { 'RELU' | 'RELU6' | 'CRELU' | 'ELU' | 'SELU' | 'SIGMOID' | 'TANH' } ] [, BATCH_SIZE = int64_value ] [, DROPOUT = float64_value ] [, HIDDEN_UNITS = int_array ] [, OPTIMIZER = { 'ADAGRAD' | 'ADAM' | 'FTRL' | 'RMSPROP' | 'SGD' } ] [, TIME_SERIES_TIMESTAMP_COL = string_value ] [, TIME_SERIES_DATA_COL = string_value ] [, TIME_SERIES_ID_COL = { string_value | string_array } ] [, HORIZON = int64_value ] [, AUTO_ARIMA = { TRUE | FALSE } ] [, AUTO_ARIMA_MAX_ORDER = int64_value ] [, AUTO_ARIMA_MIN_ORDER = int64_value ] [, NON_SEASONAL_ORDER = (int64_value, int64_value, int64_value) ] [, DATA_FREQUENCY = { 'AUTO_FREQUENCY' | 'PER_MINUTE' | 'HOURLY' | 'DAILY' | 'WEEKLY' | ... } ] [, FORECAST_LIMIT_LOWER_BOUND = float64_value ] [, FORECAST_LIMIT_UPPER_BOUND = float64_value ] [, INCLUDE_DRIFT = { TRUE | FALSE } ] [, HOLIDAY_REGION = { 'GLOBAL' | 'NA' | 'JAPAC' | 'EMEA' | 'LAC' | 'AE' | ... } ] [, CLEAN_SPIKES_AND_DIPS = { TRUE | FALSE } ] [, ADJUST_STEP_CHANGES = { TRUE | FALSE } ] [, DECOMPOSE_TIME_SERIES = { TRUE | FALSE } ] [, HIERARCHICAL_TIME_SERIES_COLS = { string_array } ] [, ENABLE_GLOBAL_EXPLAIN = { TRUE | FALSE } ] [, APPROX_GLOBAL_FEATURE_CONTRIB = { TRUE | FALSE }] [, INTEGRATED_GRADIENTS_NUM_STEPS = int64_value ] [, CALCULATE_P_VALUES = { TRUE | FALSE } ] [, FIT_INTERCEPT = { TRUE | FALSE } ] [, CATEGORY_ENCODING_METHOD = { 'ONE_HOT_ENCODING' | 'DUMMY_ENCODING' | 'LABEL_ENCODING' | 'TARGET_ENCODING' } ] [, ENDPOINT = string_value ] [, REMOTE_SERVICE_TYPE = { 'CLOUD_AI_VISION_V1' | 'CLOUD_AI_NATURAL_LANGUAGE_V1' | 'CLOUD_AI_TRANSLATE_V3' } ] [, XGBOOST_VERSION = { '0.9' | '1.1' } ] [, TF_VERSION = { '1.15' | '2.8.0' } ] [, NUM_TRIALS = int64_value, ] [, MAX_PARALLEL_TRIALS = int64_value ] [, HPARAM_TUNING_ALGORITHM = { 'VIZIER_DEFAULT' | 'RANDOM_SEARCH' | 'GRID_SEARCH' } ] [, HPARAM_TUNING_OBJECTIVES = { 'R2_SCORE' | 'ROC_AUC' | ... } ] [, NUM_PRINCIPAL_COMPONENTS = int64_value ] [, PCA_EXPLAINED_VARIANCE_RATIO = float64_value ] [, SCALE_FEATURES = { TRUE | FALSE } ] [, PCA_SOLVER = { 'FULL' | 'RANDOMIZED' | 'AUTO' } ] [, TIME_SERIES_LENGTH_FRACTION = float64_value ] [, MIN_TIME_SERIES_LENGTH = int64_value ] [, MAX_TIME_SERIES_LENGTH = int64_value ] [, TREND_SMOOTHING_WINDOW_SIZE = int64_value ] [, SEASONALITIES = string_array ] [, PROMPT_COL = string_value ] [, LEARNING_RATE_MULTIPLIER = float64_value ] [, ACCELERATOR_TYPE = { 'GPU' | 'TPU' } ] [, EVALUATION_TASK = { 'TEXT_GENERATION' | 'CLASSIFICATION' | 'SUMMARIZATION' | 'QUESTION_ANSWERING' | 'UNSPECIFIED' } ] [, DOCUMENT_PROCESSOR = string_value ] [, SPEECH_RECOGNIZER = string_value ] [, KMS_KEY_NAME = string_value ] [, CONTRIBUTION_METRIC = string_value ] [, DIMENSION_ID_COLS = string_array ] [, IS_TEST_COL = string_value ] [, MIN_APRIORI_SUPPORT = float64_value ] [, PRUNING_METHOD = {'NO_PRUNING', 'PRUNE_REDUNDANT_INSIGHTS'} ] [, TOP_K_INSIGHTS_BY_APRIORI_SUPPORT = int64_value ]
CREATE MODEL
Creates and trains a new model in the specified dataset. If the model name exists, CREATE MODEL
returns an error.
CREATE MODEL IF NOT EXISTS
Creates and trains a new model only if the model does not exist in the specified dataset.
CREATE OR REPLACE MODEL
Creates and trains a model and replaces an existing model with the same name in the specified dataset.
model_name
model_name
is the name of the model you're creating or replacing. The model name must be unique per dataset: no other model or table can have the same name. The model name must follow the same naming rules as a BigQuery table. A model name can:
model_name
is not case-sensitive.
If you don't have a default project configured, prepend the project ID to the model name in following format, including backticks: `[PROJECT_ID].[DATASET].[MODEL]`
; for example, `myproject.mydataset.mymodel`
.
TRANSFORM
TRANSFORM lets you specify all preprocessing during model creation and have it automatically applied during prediction and evaluation.
For example, you can create the following model:
CREATE OR REPLACE MODEL `myproject.mydataset.mymodel`
TRANSFORM(ML.FEATURE_CROSS(STRUCT(f1, f2)) as cross_f,
ML.QUANTILE_BUCKETIZE(f3) OVER() as buckets,
label_col)
OPTIONS(model_type='linear_reg', input_label_cols=['label_col'])
AS SELECT * FROM t
During prediction, you don't need to preprocess the input again, and the same transformations are automatically restored:
SELECT * FROM ML.PREDICT(MODEL `myproject.mydataset.mymodel`, (SELECT f1, f2, f3 FROM table))
When the TRANSFORM
clause is present, only output columns from the TRANSFORM
clause are used in training. Any results from query_statement
that don't appear in the TRANSFORM
clause are ignored.
The input columns of the TRANSFORM
clause are the result of query_statement
. So, the final input used in training is the set of columns generated by the following query:
SELECT (select_list) FROM (query_statement);
Input columns of the TRANSFORM
clause can be of any SIMPLE type or ARRAY of SIMPLE type. SIMPLE types are non-STRUCT and non-ARRAY data types.
In prediction (ML.PREDICT
), users only need to pass in the original columns from the query_statement
that are used inside the TRANSFORM
clause. The columns dropped in TRANSFORM
don't need to be provided during prediction. TRANSFORM
is automatically applied to the input data during prediction, including the statistics used in ML analytic functions (for example, ML.QUANTILE_BUCKETIZE
).
To learn more about feature preprocessing, see Feature preprocessing overview, or try the Feature Engineering Functions notebook.
To try using the TRANSFORM
clause, try the Use the BigQuery ML TRANSFORM
clause for feature engineering tutorial or the Create Model With Inline Transpose notebook.
select_list
You can pass columns from query_statement
through to model training without transformation by either using *
, * EXCEPT()
, or by listing the column names directly.
Not all columns from query_statement
are required to appear in the TRANSFORM
clause, so you can drop columns appearing in query_statement
by omitting them from the TRANSFORM
clause.
You can transform inputs from query_statement
by using expressions in select_list
. select_list
is similar to a normal SELECT
statement. select_list
supports the following syntax:
*
* EXCEPT()
* REPLACE()
expression
expression.*
The following cannot appear inside select_list
:
a + b as c
is allowed, while a + b
isn't.The output columns of select_list
can be of any BigQuery supported data type.
If present, the following columns must appear in select_list
without transformation:
label
data_split_col
kmeans_init_col
instance_weight_col
If these columns are returned by query_statement
, you must reference them in select_list
by column name outside of any expression, or by using *
. You can't use aliases with these columns.
INPUT
and OUTPUT
INPUT
and OUTPUT
clauses are used to specify input and output format for remote models or XGBoost models.
field_name
For remote models, INPUT
and OUTPUT
field names must be identical as the field names of the Vertex AI endpoint request and response. See examples in remote model INPUT
and OUTPUT
clause.
For XGBoost models, INPUT
field names must be identical to the names in the feature_names
field if feature_names
field is populated in the XGBoost model file. See XGBoost INPUT OUTPUT clause for more details.
field_type
Remote models support the following BigQuery data types for INPUT
and OUTPUT
clauses:
XGBoost models support the following BigQuery data types for INPUT
field type:
XGBoost models only support FLOAT64 for OUTPUT
field type.
connection_name
BigQuery uses a CLOUD_RESOURCE
connection to interact with your Vertex AI endpoint. You need to grant Vertex AI User role to connection's service account on your Vertex AI endpoint project.
See examples in remote model CONNECTION
statement.
To use a default connection, specify specify DEFAULT
instead of the connection name.
model_option_list
CREATE MODEL
supports the following options:
MODEL_TYPE
Syntax
MODEL_TYPE = { 'LINEAR_REG' | 'LOGISTIC_REG' | 'KMEANS' | 'PCA' |
'MATRIX_FACTORIZATION' | 'AUTOENCODER' | 'AUTOML_REGRESSOR' |
'AUTOML_CLASSIFIER' | 'BOOSTED_TREE_CLASSIFIER' | 'BOOSTED_TREE_REGRESSOR' |
'RANDOM_FOREST_CLASSIFIER' | 'RANDOM_FOREST_REGRESSOR' |
'DNN_CLASSIFIER' | 'DNN_REGRESSOR' | 'DNN_LINEAR_COMBINED_CLASSIFIER' |
'DNN_LINEAR_COMBINED_REGRESSOR' | 'ARIMA_PLUS' | 'ARIMA_PLUS_XREG' |
'TENSORFLOW' | 'TENSORFLOW_LITE' | 'ONNX' | 'XGBOOST' | 'CONTRIBUTION_ANALYSIS'}
Description
Specify the model type. This argument is required.
Arguments
The argument is in the model type column.
Model category Model type Description Model specific CREATE MODEL statement Regression'LINEAR_REG'
Linear regression for real-valued label prediction; for example, the sales of an item on a given day. CREATE MODEL statement for generalized linear models 'BOOSTED_TREE_REGRESSOR'
Create a boosted tree regressor model using the XGBoost library. CREATE MODEL statement for boosted tree models 'RANDOM_FOREST_REGRESSOR'
Create a random forest regressor model using the XGBoost library. CREATE MODEL statement for random forest models 'DNN_REGRESSOR'
Create a Deep Neural Network Regressor model. CREATE MODEL statement for DNN models 'DNN_LINEAR_COMBINED_REGRESSOR'
Create a Wide-and-Deep Regressor model. CREATE MODEL statement for Wide-and-Deep models 'AUTOML_REGRESSOR'
Create a regression model using AutoML. CREATE MODEL statement for AutoML models Classification 'LOGISTIC_REG'
Logistic regression for binary-class or multi-class classification; for example, determining whether a customer will make a purchase. CREATE MODEL statement for generalized linear models 'BOOSTED_TREE_CLASSIFIER'
Create a boosted tree classifier model using the XGBoost library. CREATE MODEL statement for boosted tree models 'RANDOM_FOREST_CLASSIFIER'
Create a random forest classifier model using the XGBoost library. CREATE MODEL statement for random forest models 'DNN_CLASSIFIER'
Create a Deep Neural Network Classifier model. CREATE MODEL statement for DNN models 'DNN_LINEAR_COMBINED_CLASSIFIER'
Create a Wide-and-Deep Classifier model. CREATE MODEL statement for Wide-and-Deep models 'AUTOML_CLASSIFIER'
Create a classification model using AutoML. CREATE MODEL statement for AutoML models Clustering 'KMEANS'
K-means clustering for data segmentation; for example, identifying customer segments. CREATE MODEL statement for K-means models Collaborative Filtering 'MATRIX_FACTORIZATION'
Matrix factorization for recommendation systems. For example, given a set of users, items, and some ratings for a subset of the items, creates a model to predict a user's rating for items they have not rated. CREATE MODEL statement for matrix factorization models Dimensionality Reduction 'PCA'
Principal component analysis for dimensionality reduction. CREATE MODEL statement for PCA models 'AUTOENCODER'
Create an Autoencoder model for anomaly detection, dimensionality reduction, and embedding purposes. CREATE MODEL statement for Autoencoder model Time series forecasting 'ARIMA_PLUS'
(previously 'ARIMA'
) Univariate time-series forecasting with many modeling components under the hood such as ARIMA model for the trend, STL and ETS for seasonality, and holiday effects. CREATE MODEL statement for time series models 'ARIMA_PLUS_XREG'
Multivariate time-series forecasting using linear regression and ARIMA_PLUS as the underlying techniques. CREATE MODEL statement for time series models Augmented analytics 'CONTRIBUTION_ANALYSIS'
Create a contribution analysis model to find key drivers of a change. CREATE MODEL statement for Contribution Analysis Importing models 'TENSORFLOW'
Create a model by importing a TensorFlow model into BigQuery. CREATE MODEL statement for TensorFlow models 'TENSORFLOW_LITE'
Create a model by importing a TensorFlow Lite model into BigQuery. CREATE MODEL statement for TensorFlow Lite models 'ONNX'
Create a model by importing an ONNX model into BigQuery. CREATE MODEL statement for ONNX models 'XGBOOST'
Create a model by importing a XGBoost model into BigQuery. CREATE MODEL statement for XGBoost models Remote models N/A Create a model by specifying a Cloud AI service, or the endpoint for a Vertex AI model. CREATE MODEL statement for remote models over Google models in Vertex AI
CREATE MODEL statement for remote models over hosted models in Vertex AI
CREATE MODEL statement for remote models over Cloud AI services
Note: We are deprecatingARIMA
as the model type. While the model training pipelines of ARIMA
and ARIMA_PLUS
are the same, ARIMA_PLUS
supports more capabilities, including support for a new training option, DECOMPOSE_TIME_SERIES
, and table-valued functions including ML.ARIMA_EVALUATE
and ML.EXPLAIN_FORECAST
. Other model options
The following table provides a comprehensive list of model options, with a brief descriptions and their applicable model types. You can find detailed description in the model specific CREATE MODEL
statement by clicking the model type in the "Applied model types" column.
When the applied model types are supervised learning models, unless "regressor" or "classifier" is explicitly listed, it means that model options apply to both the regressor and the classifier. For example, the "boosted tree" means that model option applies to both boosted tree regressor and boosted tree classifier, while the "boosted tree classifier" only applies to the classifier.
Name Description Applied model types MODEL_REGISTRY The MODEL_REGISTRY option specifies the Model Registry destination. All model types are supported. VERTEX_AI_MODEL_ID The Vertex AI model ID to register the model with. All model types are supported. VERTEX_AI_MODEL_VERSION_ALIASES The Vertex AI model alias to register the model with. All model types are supported. INPUT_LABEL_COLS The label column names in the training data. Linear & logistic regression,It takes an ARRAY of STRUCTs; each STRUCT is a (STRING, FLOAT64) pair representing a class label and the corresponding weight.
A weight must be present for every class label. The weights are not required to add up to one. For example: CLASS_WEIGHTS = [STRUCT('example_label', .2)].
Logistic regression,CLOUD_AI_DOCUMENT_V1
. Remote models over Cloud AI services SPEECH_RECOGNIZER Identifies the speech recognizer to use when the REMOTE_SERVICE_TYPE option value is CLOUD_AI_SPEECH_TO_TEXT_V2
Remote models over Cloud AI services KMS_KEY_NAME Specifies the Cloud Key Management Service customer-managed encryption key (CMEK) to use to encrypt the model. Linear & logistic regression,
AS
All model types support the following
AS
clause syntax for specifying the training data:
AS query_statement
For time series forecasting models that have a DATA_FREQUENCY
value of either DAILY
or AUTO_FREQUENCY
, you can optionally use the following AS
clause syntax to perform custom holiday modeling in addition to specifying the training data:
AS ( training_data AS (query_statement), custom_holiday AS (holiday_statement) )
query_statement
The query_statement
argument specifies the query that is used to generate the training data. For information about the supported SQL syntax of the query_statement
clause, see GoogleSQL query syntax.
holiday_statement
The holiday_statement
argument specifies the query that provides custom holiday modeling information for time series forecast models. This query must return 50,000 rows or less and must contain the following columns:
region
: Required. A STRING
value that identifies the region to target for holiday modeling. Use one of the following options:
SELECT * FROM bigquery-public-data.ml_datasets.holidays_and_events_for_forecasting WHERE region = region
.London
if you are only modeling holidays for that city.Be sure not to use an existing holiday region code when you are trying to model for a custom region. For example, if you want to model a holiday in California, and specify CA
as the region
value, the service recognizes that as the holiday region code for Canada and targets that region. Because the argument is case-sensitive, you could specify ca
, California
, or some other value that isn't a holiday region code.
holiday_name
: Required. A STRING
value that identifies the holiday to target for holiday modeling. Use one of the following options:
bigquery-public-data.ml_datasets.holidays_and_events_for_forecasting
public table, including case. Use this option to overwrite or supplement the specified holiday.ML.EXPLAIN_FORECAST
output. For example, it cannot contain space. For more information on column naming, see Column names.primary_date
: Required. A DATE
value that specifies the date the holiday falls on.
preholiday_days
: Optional. An INT64
value that specifies the start of the holiday window around the holiday that is taken into account when modeling. Must be greater than or equal to 1
. Defaults to 1
.
postholiday_days
: Optional. An INT64
value that specifies the end of the holiday window around the holiday that is taken into account when modeling. Must be greater than or equal to 1
. Defaults to 1
.
The preholiday_days
and postholiday_days
arguments together describe the holiday window around the holiday that is taken into account when modeling. The holiday window is defined as [primary_date - preholiday_days, primary_date + postholiday_days]
and is inclusive of the pre- and post-holiday days. The value for each holiday window must be less than or equal to 30
and must be the same across the given holiday. For example, if you are modeling Arbor Day for several different years, you must specify the same holiday window for all of those years.
To achieve the best holiday modeling result, provide as much historical and forecast information about the occurrences of each included holiday as possible. For example, if you have time series data from 2018 to 2022 and would like to forecast for 2023, you get the best result by providing the custom holiday information for all of those years, similar to the following:
CREATE OR REPLACE MODEL `mydataset.arima_model` OPTIONS ( model_type = 'ARIMA_PLUS', holiday_region = 'US',...) AS ( training_data AS (SELECT * FROM `mydataset.timeseries_data`), custom_holiday AS ( SELECT 'US' AS region, 'Halloween' AS holiday_name, primary_date, 5 AS preholiday_days, 1 AS postholiday_days FROM UNNEST( [ DATE('2018-10-31'), DATE('2019-10-31'), DATE('2020-10-31'), DATE('2021-10-31'), DATE('2022-10-31'), DATE('2023-10-31')]) AS primary_date ) )Supported inputs
The CREATE MODEL
statement supports the following data types for input label, data split columns and input feature columns.
See Supported input feature types for BigQuery ML supported input feature types.
Supported data types for input label columnsBigQuery ML supports different GoogleSQL data types depending on the model type. Supported data types for input_label_cols
include:
BigQuery ML supports different GoogleSQL data types depending on the data split method. Supported data types for data_split_col
include:
CREATE MODEL
statements must comply with the following rules:
CREATE
statement is allowed.CREATE MODEL
statement, the size of the model must be 90 MB or less or the query fails. Generally, if all categorical variables are short strings, a total feature cardinality (model dimension) of 5-10 million is supported. The dimensionality is dependent on the cardinality and length of the string variables.NULL
values. If the label column contains NULL
values, then the query fails.CREATE MODEL IF NOT EXISTS
clause always updates the last modified timestamp of a model.CREATE MODEL
statement cannot contain EXTERNAL_QUERY
. If you want to use EXTERNAL_QUERY
, then materialize the query result and then use the CREATE MODEL
statement with the newly created table.Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-08-07 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-07 UTC."],[[["The `CREATE MODEL` statement in BigQuery ML is used to build machine learning models, similar to how `CREATE TABLE` is used for tables, and executing it initiates a query job."],["Creating a BigQuery ML model requires specific permissions, including `bigquery.datasets.create`, `bigquery.jobs.create`, and `bigquery.models.create`, and can also utilize predefined IAM roles like BigQuery Studio Admin and BigQuery Admin."],["The syntax for `CREATE MODEL` allows for variations like `CREATE MODEL IF NOT EXISTS` and `CREATE OR REPLACE MODEL`, and can include options for data transformation, remote connections, and a wide array of model-specific settings, such as `MODEL_TYPE`, `MAX_ITERATIONS`, and `DATA_SPLIT_METHOD`."],["BigQuery ML supports a diverse range of model types, including regression, classification, clustering, collaborative filtering, dimensionality reduction, time series forecasting, and importing external models, with each model type having its own set of applicable options."],["Model creation can have preprocessing steps specified via the `TRANSFORM` clause, which is automatically applied during both prediction and evaluation, and supports a list of BigQuery functions excluding aggregation, UDF, subqueries, and anonymous columns."]]],[]]
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4