A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from http://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create-wnd-models below:

The CREATE MODEL statement for Wide-and-Deep models | BigQuery

Stay organized with collections Save and categorize content based on your preferences.

The CREATE MODEL statement for Wide-and-Deep models

This document describes the CREATE MODEL statement for creating wide-and-deep models in BigQuery by using SQL. Alternatively, you can use the Google Cloud console user interface to create a model by using a UI (Preview) instead of constructing the SQL statement yourself.

You can use wide-and-deep regressor models with the ML.PREDICT function to perform regression, and you can use wide-and-deep classifier models with the ML.PREDICT function to perform classification. You can use both types of wide-and-deep models with the ML.PREDICT function to perform anomaly detection.

For information about the supported SQL statements and functions for each model type, see End-to-end user journey for each model.

CREATE MODEL syntax
{CREATE MODEL | CREATE MODEL IF NOT EXISTS | CREATE OR REPLACE MODEL} model_name
OPTIONS(model_option_list)
AS query_statement

model_option_list:
MODEL_TYPE = { 'DNN_LINEAR_COMBINED_CLASSIFIER' | 'DNN_LINEAR_COMBINED_REGRESSOR' }
    [, LEARN_RATE = float64_value | struct_array ]
    [, OPTIMIZER = string_value | struct_array ]
    [, L1_REG = { float64_value | HPARAM_RANGE(range) | HPARAM_CANDIDATES([candidates]) } ]
    [, L2_REG = { float64_value | HPARAM_RANGE(range) | HPARAM_CANDIDATES([candidates]) } ]
    [, ACTIVATION_FN = { { 'RELU' | 'RELU6' | 'CRELU' | 'ELU' | 'SELU' | 'SIGMOID' | 'TANH' } | HPARAM_CANDIDATES([candidates]) } ]
    [, BATCH_SIZE = { int64_value | HPARAM_RANGE(range) | HPARAM_CANDIDATES([candidates]) } ]
    [, DROPOUT = { float64_value | HPARAM_RANGE(range) | HPARAM_CANDIDATES([candidates]) } ]
    [, HIDDEN_UNITS = { int_array | HPARAM_CANDIDATES([candidates]) } ]
    [, INTEGRATED_GRADIENTS_NUM_STEPS = int64_value ]
    [, TF_VERSION = { '1.15' | '2.8.0' } ]
    [, AUTO_CLASS_WEIGHTS = { TRUE | FALSE } ]
    [, CLASS_WEIGHTS = struct_array ]
    [, ENABLE_GLOBAL_EXPLAIN = { TRUE | FALSE } ]
    [, EARLY_STOP = { TRUE | FALSE } ]
    [, MIN_REL_PROGRESS = float64_value ]
    [, INPUT_LABEL_COLS = string_array ]
    [, MAX_ITERATIONS = int64_value ]
    [, WARM_START = { TRUE | FALSE } ]
    [, DATA_SPLIT_METHOD = { 'AUTO_SPLIT' | 'RANDOM' | 'CUSTOM' | 'SEQ' | 'NO_SPLIT' } ]
    [, DATA_SPLIT_EVAL_FRACTION = float64_value ]
    [, DATA_SPLIT_TEST_FRACTION = float64_value ]
    [, DATA_SPLIT_COL = string_value ]
    [, NUM_TRIALS = int64_value ]
    [, MAX_PARALLEL_TRIALS = int64_value ]
    [, HPARAM_TUNING_ALGORITHM = { 'VIZIER_DEFAULT' | 'RANDOM_SEARCH' | 'GRID_SEARCH' } ]
    [, HPARAM_TUNING_OBJECTIVES = { 'ROC_AUC' | 'R2_SCORE' | ... } ]
    [, KMS_KEY_NAME = string_value ]
CREATE MODEL

Creates and trains a new model in the specified dataset. If the model name exists, CREATE MODEL returns an error.

CREATE MODEL IF NOT EXISTS

Creates and trains a new model only if the model doesn't exist in the specified dataset.

CREATE OR REPLACE MODEL

Creates and trains a model and replaces an existing model with the same name in the specified dataset.

model_name

The name of the model you're creating or replacing. The model name must be unique in the dataset: no other model or table can have the same name. The model name must follow the same naming rules as a BigQuery table. A model name can:

model_name is not case-sensitive.

If you don't have a default project configured, then you must prepend the project ID to the model name in the following format, including backticks:

`[PROJECT_ID].[DATASET].[MODEL]`

For example, `myproject.mydataset.mymodel`.

MODEL_TYPE
MODEL_TYPE = { 'DNN_LINEAR_COMBINED_CLASSIFIER' | 'DNN_LINEAR_COMBINED_REGRESSOR' }

Description

Specifies the model type. This option is required.

LEARN_RATE

Syntax

LEARN_RATE = float64_value | struct_array

Description

The initial learn rate for training.

Arguments

To specify an initial learn rate for both the linear and DNN parts of the model, provide a FLOAT64 value. For example:

LEARN_RATE = 0.01

To specify different initial learn rates for the linear and DNN parts of the model, provide an array of STRUCT values that each contain one STRING and one FLOAT64 value. The string identifies the part of the model, and the float specifies the initial learn rate to use for that part of the model. For example:

LEARN_RATE = [STRUCT('dnn', 0.001), STRUCT('linear', 0.01)]

The default value is 0.001 for both the linear and DNN parts of the model.

OPTIMIZER

Syntax

OPTIMIZER = string_value | struct_array

Description

Specifies the optimizer for training the model.

Arguments

To specify an optimizer for both the linear and DNN parts of the model, provide a STRING value. For example:

OPTIMIZER = 'ADAGRAD'

To specify different optimizers for the linear and DNN parts of the model, provide an array of STRUCT values that each contain two STRING values. The first string identifies the part of the model, and the second string specifies the optimizer to use for that part of the model. For example:

OPTIMIZER = [STRUCT('dnn', 'ADAGRAD'), STRUCT('linear', 'SGD')]

Use one of the following values for the optimizer string:

The default value of OPTIMIZER is [STRUCT('dnn', 'ADAM'), STRUCT('linear', 'FTRL')].

L1_REG

Syntax

L1_REG = { float64_value | HPARAM_RANGE(range) | HPARAM_CANDIDATES([candidates]) }

Description

The L1 regularization strength of the OPTIMIZER. You can only use this option when OPTIMIZER is set to one of the following values:

Arguments

If you aren't running hyperparameter tuning, then you can specify a FLOAT64 value. The default value is 0.

If you are running hyperparameter tuning, then you can use one of the following options:

When running hyperparameter tuning, the valid range is (0, ∞), the default range is (0, 10.0], and the scale type is LOG.

L2_REG

Syntax

L2_REG = { float64_value | HPARAM_RANGE(range) | HPARAM_CANDIDATES([candidates]) }

Description

The L2 regularization strength of the OPTIMIZER. You can only use this option when OPTIMIZER is set to one of the following values:

Arguments

If you aren't running hyperparameter tuning, then you can specify a FLOAT64 value. The default value is 0.

If you are running hyperparameter tuning, then you can use one of the following options:

When running hyperparameter tuning, the valid range is (0, ∞), the default range is (0, 10.0], and the scale type is LOG.

ACTIVATION_FN

Syntax

ACTIVATION_FN = { { 'RELU' | 'RELU6' | 'CRELU' | 'ELU' | 'SELU' | 'SIGMOID' | 'TANH' } | HPARAM_CANDIDATES([candidates]) }

Description

The activation function of the neural network.

Arguments

This option accepts the following values:

If you are running hyperparameter training, then you can provide more than one value for this option by using HPARAM_CANDIDATES and specifying an array. For example, ACTIVATION_FN = HPARAM_CANDIDATES(['RELU', 'RELU6', 'TANH']).

BATCH_SIZE

Syntax

BATCH_SIZE = { int64_value | HPARAM_RANGE(range) | HPARAM_CANDIDATES([candidates]) }

Description

The mini batch size of samples that are fed to the neural network.

Arguments

If you aren't running hyperparameter tuning, specify an INT64 value that is positive and is less than or equal to 8192. The default value is 32 or the number of samples, whichever is smaller.

If you are running hyperparameter tuning, use one of the following options:

When running hyperparameter tuning, the valid range is (0, ∞), the default range is [16, 1024], and the scale type is LOG.

DROPOUT

Syntax

DROPOUT = { float64_value | HPARAM_RANGE(range) | HPARAM_CANDIDATES([candidates]) }

Description

The dropout rate of units in the neural network.

Arguments

If you aren't running hyperparameter tuning, then you can specify a FLOAT64 value that is positive and is less than or equal to 1.0. The default value is 0.

If you are running hyperparameter tuning, then you must use one of the following options:

When running hyperparameter tuning, the valid range is [0, 1.0), the default range is [0, 0.8], and the scale type is LINEAR.

HIDDEN_UNITS

Syntax

HIDDEN_UNITS = { int_array | HPARAM_CANDIDATES([candidates]) }

Description

The hidden layers of the neural network.

Arguments

An array of integers that represents the architecture of the hidden layers. If not specified, BigQuery ML applies a single hidden layer that contains no more than 128 units. The number of units is calculated as [min(128, num_samples / (10 * (num_input_units + num_output_units)))]. The upper bound of the rule ensures that the model isn't over fitting.

The number in the middle of the array defines the shape of the latent space. For example, hidden_units=[128, 64, 4, 64, 128] defines a four-dimensional latent space.

The number of layers in hidden_units must be odd, and we recommend that the sequence be symmetrical.

The following example defines a model architecture that uses three hidden layers with 256, 128, and 64 nodes, respectively.

HIDDEN_UNITS = [256, 128, 64]

If you are running hyperparameter tuning, then you must use the HPARAM_CANDIDATES keyword and specify an array in the form ARRAY<STRUCT<ARRAY<INT64>>> to provide discrete values to use for the hyperparameter. Each struct value in the outer array represents a candidate neural architecture. The array of INT64 values in each struct represents a hidden layer.

The following example represents a neural architecture search with three candidates, which include a single layer of 8 neurons, two layers of neurons with 8 and 16 in sequence, and three layers of neurons with 16, 32 and 64 in sequence, respectively.

hidden_units=hparam_candidates([struct([8]), struct([8, 16]), struct([16, 32, 64])])

The valid range for the INT64 arrays is [1, ∞).

INTEGRATED_GRADIENTS_NUM_STEPS

Syntax

INTEGRATED_GRADIENTS_NUM_STEPS = int64_value

Description

Specifies the number of steps to sample between the example being explained and its baseline for approximating the integral when using integrated gradients attribution methods.

Arguments

An INT64 value. The default value is 50.

You can only set this option if ENABLE_GLOBAL_EXPLAIN is TRUE.

TF_VERSION

Syntax

TF_VERSION = { '1.15' | '2.8.0' }

Description

Specifies the TensorFlow version for model training. The default value is 1.15.

Set TF_VERSION to 2.8.0 to use TensorFlow2 with the Keras API.

AUTO_CLASS_WEIGHTS

Syntax

AUTO_CLASS_WEIGHTS = { TRUE | FALSE }

Description

Determines whether to balance class labels by using weights for each class in inverse proportion to the frequency of that class.

Only use this option with classifier models.

By default, the training data used to create the model is unweighted. If the labels in the training data are imbalanced, the model might learn to predict the most popular class of labels more heavily, which you might not want.

To balance every class, set this option to TRUE. Balance is accomplished using the following formula:

total_input_rows / (input_rows_for_class_n * number_of_unique_classes)

Arguments

A BOOL value. The default value is FALSE.

CLASS_WEIGHTS

Syntax

CLASS_WEIGHTS = struct_array

Description

The weights to use for each class label. You can't specify this option if AUTO_CLASS_WEIGHTS is TRUE.

Arguments

An ARRAY of STRUCT values. Each STRUCT contains a STRING value that specifies the class label and a FLOAT64 value that specifies the weight for that class label. A weight must be present for every class label. The weights are not required to add up to 1.

A CLASS_WEIGHTS value might look like the following example:

CLASS_WEIGHTS = [STRUCT('example_label', .2)]
ENABLE_GLOBAL_EXPLAIN

Syntax

ENABLE_GLOBAL_EXPLAIN = { TRUE | FALSE }

Description

Determines whether to compute global explanations by using explainable AI to evaluate the importance of global features to the model.

Global explanations are computed when you create the model. This option must be TRUE if you want to use the ML.GLOBAL_EXPLAIN function to retrieve the global explanations after the model is created.

Arguments

A BOOL value. The default value is FALSE.

EARLY_STOP

Syntax

EARLY_STOP = { TRUE | FALSE }

Description

Determines whether training should stop after the first iteration in which the relative loss improvement is less than the value specified for MIN_REL_PROGRESS.

Arguments

A BOOL value. The default value is TRUE.

MIN_REL_PROGRESS

Syntax

MIN_REL_PROGRESS = float64_value

Description

The minimum relative loss improvement that is necessary to continue training when EARLY_STOP is set to TRUE. For example, a value of 0.01 specifies that each iteration must reduce the loss by 1% for training to continue.

Arguments

A FLOAT64 value. The default value is 0.01.

INPUT_LABEL_COLS

Syntax

INPUT_LABEL_COLS = string_array

Description

The name of the label column in the training data. The label column contains the expected machine learning result for the given record. For example, in a spam detection dataset, the label column value would probably be either spam or not spam. In a rainfall dataset, the label column value might be the amount of rain that fell during a certain period.

Arguments

A one-element ARRAY of string values. Defaults to label.

MAX_ITERATIONS

Syntax

MAX_ITERATIONS = int64_value

Description

The maximum number of training iterations, where one iteration represents a single pass of the entire training data.

Arguments

An INT64 value. The default value is 20.

WARM_START

Syntax

WARM_START = { TRUE | FALSE }

Description

Determines whether to train a model with new training data, new model options, or both. Unless you explicitly override them, the initial options used to train the model are used for the warm start run.

In a warm start run, the iteration numbers are reset to start from zero. Use the training run or iteration information returned by the ML.TRAINING_INFO function to distinguish the warm start run from the original run.

In a warm start, the values of the MODEL_TYPE, LABELS, and HIDDEN_UNITS options, and the training data schema, must remain the same as they were in previous training job.

Arguments

A BOOL value. The default value is FALSE.

DATA_SPLIT_METHOD

Syntax

DATA_SPLIT_METHOD = { 'AUTO_SPLIT' | 'RANDOM' | 'CUSTOM' | 'SEQ' | 'NO_SPLIT' }

Description

The method used to split input data into training, evaluation, and, if you are running hyperparameter tuning, test data sets. Training data is used to train the model. Evaluation data is used to avoid overfitting by using early stopping. Test data is used to test the hyperparameter tuning trial and record its metrics in the model.

The percentage sizes of the data sets produced by the various arguments for this option are approximate. Larger input data sets come closer to the percentages described than smaller input data sets do.

You can see the model's data split information in the following ways:

Arguments

This option accepts the following values:

*

AUTO_SPLIT

: This is the default value. This option splits the data as follows:

DATA_SPLIT_EVAL_FRACTION

Syntax

DATA_SPLIT_EVAL_FRACTION = float64_value

Description

The fraction of the data to use as evaluation data. Use when you are specifying RANDOM or SEQ as the value for the DATA_SPLIT_METHOD option.

If you are running hyperparameter tuning and you specify a value for this option, you must also specify a value for DATA_SPLIT_TEST_FRACTION. In this case, the training dataset is 1 - eval_fraction - test_fraction. For example, if you specify 20.00 for DATA_SPLIT_EVAL_FRACTION and 8.0 for DATA_SPLIT_TEST_FRACTION, your training dataset is 72% of the input data.

Arguments

A FLOAT64 value. The default is 0.2. The service maintains the accuracy of the input value to two decimal places.

DATA_SPLIT_TEST_FRACTION

Syntax

DATA_SPLIT_TEST_FRACTION = float64_value

Description

The fraction of the data to use as test data. Use this option when you are running hyperparameter tuning and specifying either RANDOM or SEQ as value for the DATA_SPLIT_METHOD option.

If you specify a value for this option, you must also specify a value for DATA_SPLIT_EVAL_FRACTION. In this case, the training dataset is 1 - eval_fraction - test_fraction. For example, if you specify 20.00 for DATA_SPLIT_EVAL_FRACTION and 8.0 for DATA_SPLIT_TEST_FRACTION, your training dataset is 72% of the input data.

Arguments

A FLOAT64 value. The default is 0. The service maintains the accuracy of the input value to two decimal places.

DATA_SPLIT_COL

Syntax

DATA_SPLIT_COL = string_value

Description

The name of the column to use to sort input data into the training, evaluation, or test set. Use when you are specifying CUSTOM or SEQ as the value for the DATA_SPLIT_METHOD option:

The column you specify for DATA_SPLIT_COL can't be used as a feature or label, and is excluded from features automatically.

Arguments

A STRING value.

NUM_TRIALS

Syntax

NUM_TRIALS = int64_value

Description

The maximum number of submodels to train. The tuning stops when NUM_TRIALS submodels are trained, or when the hyperparameter search space is exhausted. You must specify this option in order to use hyperparameter tuning.

Arguments

An INT64 value between 1 and 100, inclusive.

Note: We recommend using at least (number_of_hyperparameters * 10) trials to tune a model. MAX_PARALLEL_TRIALS

Syntax

MAX_PARALLEL_TRIALS = int64_value

Description

The maximum number of trials to run at the same time. If you specify a value for this option, you must also specify a value for NUM_TRIALS.

Arguments

An INT64 value between 1 and 5, inclusive. The default value is 1.

Note: Although specifying larger MAX_PARALLEL_TRIALS values can accelerate the hyperparameter tuning process, acceleration can undermine the final model quality when you specify VIZIER_DEFAULT as the HPARAM_TUNING_ALGORITHM value. This is because the parallel trials can't benefit from concurrent training results. HPARAM_TUNING_ALGORITHM

Syntax

HPARAM_TUNING_ALGORITHM = { 'VIZIER_DEFAULT' | 'RANDOM_SEARCH' | 'GRID_SEARCH' }

Description

The algorithm used to tune the hyperparameters. If you specify a value for this option, you must also specify a value for NUM_TRIALS.

Arguments

Specify one of the following values:

HPARAM_TUNING_OBJECTIVES

Syntax

For DNN_LINEAR_COMBINED_CLASSIFIER models:

HPARAM_TUNING_OBJECTIVES = { 'PRECISION' | 'RECALL' | 'ACCURACY' | 'F1_SCORE' | 'LOG_LOSS' | 'ROC_AUC' }

For DNN_LINEAR_COMBINED_REGRESSOR models:

HPARAM_TUNING_OBJECTIVES = { 'MEAN_ABSOLUTE_ERROR' | 'MEAN_SQUARED_ERROR' | 'MEAN_SQUARED_LOG_ERROR' | 'MEDIAN_ABSOLUTE_ERROR' | 'R2_SCORE' | 'EXPLAINED_VARIANCE' }

Description The hyperparameter tuning objective for the model; only one objective is supported. If you specify a value for this option, you must also specify a value for NUM_TRIALS.

Arguments

The possible objectives are a subset of the model evaluation metrics for the model type. If you aren't running hyperparameter tuning, or if you are and you don't specify an objective, then the default objective is used. For DNN_LINEAR_COMBINED_CLASSIFIER models, the default is ROC_AUC. For DNN_LINEAR_COMBINED_REGRESSOR models, the default is R2_SCORE.

KMS_KEY_NAME

Syntax

KMS_KEY_NAME = string_value

Description

The Cloud Key Management Service customer-managed encryption key (CMEK) to use to encrypt the model.

Arguments

A STRING value containing the fully-qualified name of the CMEK. For example,

'projects/my_project/locations/my_location/keyRings/my_ring/cryptoKeys/my_key'
Internal parameter defaults

BigQuery ML uses the following default values when building models:

loss_reduction = losses_utils.ReductionV2.SUM_OVER_BATCH_SIZE

batch_norm = False
query_statement

The AS query_statement clause specifies the GoogleSQL query used to generate the training data. See the GoogleSQL query syntax page for the supported SQL syntax of the query_statement clause.

All columns referenced by the query_statement are used as input features to the model except for the columns included in INPUT_LABEL_COLS and DATA_SPLIT_COL.

Hyperparameter tuning

Wide-and-deep classification and regression models support hyperparameter tuning, which you can use to improve model performance for your data. To use hyperparameter tuning, set the NUM_TRIALs option to the number of trials that you want to run. BigQuery ML then trains the model the number of times that you specify, using different hyperparameter values, and returns the model that performs the best.

Hyperparameter tuning defaults to improving the key performance metric for the given model type. You can use the HPARAM_TUNING_OBJECTIVES option to tune for a different metric if you need to.

For more information about the training objectives and hyperparameters supported for wide-and-deep classification models, see DNN_LINEAR_COMBINED_CLASSIFIER. For more information about the training objectives and hyperparameters supported for wide-and-deep regression models, see DNN_LINEAR_COMBINED_REGRESSOR. To try a tutorial that walks you through hyperparameter tuning, see Improve model performance with hyperparameter tuning.

Locations

For information about supported locations, see Locations for non-remote models.

Limitations

CREATE MODEL statements must comply with the following rules:

Example

The following example trains a Wide-and-Deep classifier model against 'mytable' with 'mylabel' as the label column.

CREATE MODEL `project_id.mydataset.mymodel`
OPTIONS(MODEL_TYPE='DNN_LINEAR_COMBINED_CLASSIFIER',
        ACTIVATION_FN = 'RELU',
        BATCH_SIZE = 16,
        DROPOUT = 0.1,
        EARLY_STOP = FALSE,
        HIDDEN_UNITS = [128, 128, 128],
        INPUT_LABEL_COLS = ['mylabel'],
        LEARN_RATE=0.001,
        MAX_ITERATIONS = 50,
        OPTIMIZER = 'ADAGRAD')
AS SELECT * FROM `project_id.mydataset.mytable`;
Supported regions

Training Wide-and-Deep models is not supported in all BigQuery ML regions. For a complete list of supported regions and multi-regions, see BigQuery ML locations page.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4