A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from http://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create-remote-model below:

The CREATE MODEL statement for remote models over LLMs | BigQuery

Stay organized with collections Save and categorize content based on your preferences.

The CREATE MODEL statement for remote models over LLMs

This document describes the CREATE MODEL statement for creating remote models in BigQuery over models in Vertex AI by using SQL. Alternatively, you can use the Google Cloud console user interface to create a model by using a UI (Preview) instead of constructing the SQL statement yourself.

After you create the remote model, you can use one of the following functions to perform generative AI with that model:

CREATE MODEL syntax
{CREATE MODEL | CREATE MODEL IF NOT EXISTS | CREATE OR REPLACE MODEL}
`project_id.dataset.model_name`
REMOTE WITH CONNECTION {`project_id.region.connection_id` | DEFAULT}
OPTIONS(
  ENDPOINT = 'vertex_ai_llm_endpoint'
  [, PROMPT_COL = 'prompt_col']
  [, INPUT_LABEL_COLS = input_label_cols]
  [, MAX_ITERATIONS = max_iterations]
  [, LEARNING_RATE_MULTIPLIER = learning_rate_multiplier]
  [, DATA_SPLIT_METHOD = 'data_split_method']
  [, DATA_SPLIT_EVAL_FRACTION = data_split_eval_fraction]
  [, DATA_SPLIT_COL = 'data_split_col']
  [, EVALUATION_TASK = 'evaluation_task'])
[AS SELECT prompt_column, label_column FROM `project_id.dataset.table_name`]
CREATE MODEL

Creates and trains a new model in the specified dataset. If the model name exists, CREATE MODEL returns an error.

CREATE MODEL IF NOT EXISTS

Creates and trains a new model only if the model doesn't exist in the specified dataset.

CREATE OR REPLACE MODEL

Creates and trains a model and replaces an existing model with the same name in the specified dataset.

model_name

The name of the model you're creating or replacing. The model name must be unique in the dataset: no other model or table can have the same name. The model name must follow the same naming rules as a BigQuery table. A model name can:

model_name is not case-sensitive.

If you don't have a default project configured, then you must prepend the project ID to the model name in the following format, including backticks:

`[PROJECT_ID].[DATASET].[MODEL]`

For example, `myproject.mydataset.mymodel`.

REMOTE WITH CONNECTION

Syntax

`[PROJECT_ID].[LOCATION].[CONNECTION_ID]`

BigQuery uses a Cloud resource connection to interact with the Vertex AI endpoint.

The connection elements are as follows:

If you are creating a remote model over a Vertex AI model that uses supervised tuning, you need to grant the

Vertex AI Service Agent role

to the connection's service account in the project where you create the model. Otherwise, you need to grant the

Vertex AI User role

to the connection's service account in the project where you create the model.

If you are using the remote model to analyze unstructured data from an object table, you must also grant the Vertex AI Service Agent role to the service account of the connection associated with the object table. You can find the object table's connection in the Google Cloud console, on the Details pane for the object table.

Example

`myproject.us.my_connection`
ENDPOINT

Syntax

ENDPOINT = 'vertex_ai_llm_endpoint'

Description

The Vertex AI endpoint for the remote model to use. You can specify the name of the Vertex AI model, for example gemini-2.0-flash, or you can specify the Vertex AI model's endpoint URL, for example https://europe-west6-aiplatform.googleapis.com/v1/projects/myproject/locations/europe-west6/publishers/google/models/gemini-2.0-flash. If you specify the model name, BigQuery ML automatically identifies and uses the full endpoint of the Vertex AI model. The resource portion of the URL is saved as the remote endpoint value in the model metadata. For example, projects/myproject/locations/europe-west6/publishers/google/models/gemini-2.0-flash.

Arguments

A STRING value that contains the model name of the target Vertex AI LLM. The following LLMs are supported:

Gemini multimodal models

All of the generally available and preview Gemini models are supported.

Note: To provide feedback or request support for the models in preview, send an email to bqml-feedback@google.com.

To always use the latest version of a Gemini model, specify one of the Gemini auto-updated aliases. For more information on Gemini model versioning, see Model versions and lifecycle.

For supported Gemini models, you can specify the global endpoint, as shown in the following example:

  https://aiplatform.googleapis.com/v1/projects/test-project/locations/global/publishers/google/models/gemini-2.0-flash-001

Using the global endpoint for your requests can improve overall availability while reducing resource exhausted (429) errors, which occur when you exceed your quota for a regional endpoint. If you want to use Gemini 2.0+ in a region where it isn't available, you can avoid migrating your data to a different region by using the global endpoint instead. You can only use the global endpoint with the ML.GENERATE_TEXT function.

Note: Don't use the global endpoint if you have requirements for the data processing location, because when you use the global endpoint, you can't control or know the region where your processing requests are handled.

After you create a remote model based on a Gemini model, you can do the following:

Note: Using Gemini 2.5 models with any of these functions incurs charges for the thinking process. With some functions, you can set a budget for the thinking process for Gemini 2.5 Flash and Gemini 2.5 Flash-Lite models. You can't set a budget for Gemini 2.5 Pro models. See the documentation for a given function for details. Claude models

The following Anthropic Claude models are supported:

You must enable Claude models in Vertex AI before you can use them. For more information, see Enable a partner model.

Although Claude models are multimodal, you can only use text input with Claude models in BigQuery ML.

After you create a remote model based on a Claude model, you can use the model with the ML.GENERATE_TEXT function to generate text based on a prompt you provide in a query or from a column in a standard table.

Mistral AI models

The following Mistral AI models are supported:

Don't use a version suffix with any Mistral AI model.

You must enable Mistral AI models in Vertex AI before you can use them. For more information, see Enable a partner model.

After you create a remote model based on a Mistral AI model, you can use the model with the ML.GENERATE_TEXT function to generate text based on a prompt you provide in a query or from a column in a standard table.

Llama models

To create a Llama model in BigQuery ML, you must specify it as an OpenAI API endpoint in the format openapi/<publisher_name>/<model_name>.

The following Llama models are supported:

Important: For Llama 4.0 and greater models, you must create the dataset and connection for the remote model in the same region as the Llama model endpoint.

You must enable Llama models in Vertex AI before you can use them. For more information, see Enable a partner model.

After you create a remote model based on a Llama model, you can use the model with the ML.GENERATE_TEXT function to generate text based on a prompt you provide in a query or from a column in a standard table.

multimodalembedding embedding models

The multimodalembedding embedding model is supported. You must specify the 001 version of the model, multimodalembedding@001.

After you create a remote model based on a multimodalembedding embedding model, you can use the model with the ML.GENERATE_EMBEDDING function to generate embeddings from text data in a BigQuery table or from visual content in a BigQuery object table.

text embedding models

The following embedding models are supported:

You must specify a supported model version.

After you create a remote model based on an embedding model, you can use the model with the ML.GENERATE_EMBEDDING function to generate embeddings from text data in a BigQuery table.

For information that can help you choose between the supported models, see Model information.

Retired models

The following Vertex AI models are retired as of April 9, 2025:

The following Vertex AI models are retired as of May 24, 2025:

For more information on retired Vertex AI models, see Retired models.

PROMPT_COL

Syntax

PROMPT_COL = 'prompt_col'

Description

The name of the prompt column in the training data table to use when performing supervised tuning. If you don't specify a value for this option, you must have a column named or aliased as prompt in your input data. You can only use this option when performing supervised tuning with a supported model. If you specify this option, you must also specify the AS SELECT clause.

Arguments

A STRING value. The default value is prompt.

INPUT_LABEL_COLS

Syntax

INPUT_LABEL_COLS = input_label_cols

Description

The name of the label column in the training data table to use when performing supervised tuning. If you don't specify a value for this option, you must have a column named or aliased as label in your input data. You can only use this option when performing supervised tuning with a supported model. If you specify this option, you must also specify the AS SELECT clause.

Arguments

A one-element ARRAY<STRING> value. The default value is an empty array.

MAX_ITERATIONS

Syntax

MAX_ITERATIONS = max_iterations

Description

The number of steps to run when performing supervised tuning. You can only use this option when performing supervised tuning with a supported model. If you specify this option, you must also specify the AS SELECT clause.

When you use a Gemini model, BigQuery ML automatically converts the MAX_ITERATIONS value to epochs, which is what Gemini models use for training. The default value for MAX_ITERATIONS is the number of rows in the input data, which is equivalent to one epoch. To use multiple epochs, specify a multiple of the number of rows in your training data. For example, if you have 100 rows of input data and you want to use two epochs, specify 200 for the argument value. If you provide a value that isn't a multiple of the number of rows in the input data, BigQuery ML rounds up to the nearest epoch. For example, if you have 100 rows of input data and you specify 101 for the MAX_ITERATIONS value, training is performed with two epochs.

For more information about the parameters that are used to tune Gemini models, see Create a tuning job.

For more guidance on choosing the number of epochs for Gemini models, see Recommended configurations.

Arguments

An INT64 value between 1 and ∞. Typically, 100 steps takes about an hour to complete. The default value is 300.

LEARNING_RATE_MULTIPLIER

Syntax

LEARNING_RATE_MULTIPLIER = learning_rate_multiplier

Description

A multiplier to apply to the recommended learning rate when performing supervised tuning. You can only use this option with a remote model that targets a Vertex AI gemini-1.5-pro-002 or gemini-1.5-flash-002 model. If you specify this option, you must also specify the AS SELECT clause.

Arguments

A positive FLOAT64 value. The default value is 1.0.

DATA_SPLIT_METHOD

Syntax

DATA_SPLIT_METHOD = { 'AUTO_SPLIT' | 'RANDOM' | 'CUSTOM' | 'SEQ' | 'NO_SPLIT' }

Description

The method used to split input data into training and evaluation sets when performing supervised tuning. You can only use this option when performing supervised tuning with a supported model. If you specify this option, you must also specify the AS SELECT clause.

Training data is used to train the model. Evaluation data is used to avoid overfitting by using early stopping.

The percentage sizes of the data sets produced by the various arguments for this option are approximate. Larger input data sets come closer to the percentages described than smaller input data sets do.

You can see the model's data split information in the following ways:

Arguments

This option accepts the following values:

DATA_SPLIT_EVAL_FRACTION

Syntax

DATA_SPLIT_EVAL_FRACTION = data_split_eval_fraction

Description

The fraction of the data to use as evaluation data when performing supervised tuning. Use when you specify RANDOM or SEQ as the value for the DATA_SPLIT_METHOD option. You can only use this option when performing supervised tuning with a supported model. If you specify this option, you must also specify the AS SELECT clause.

Arguments

A FLOAT64 value in the range [0, 1.0]. The default is 0.2. The service maintains the accuracy of the input value to two decimal places.

DATA_SPLIT_COL

Syntax

DATA_SPLIT_COL = 'data_split_col'

Description

The name of the column to use to sort input data into the training or evaluation set when performing supervised tuning. Use when you are specifying CUSTOM or SEQ as the value for DATA_SPLIT_METHOD. You can only use this option when performing supervised tuning with a supported model. If you specify this option, you must also specify the AS SELECT clause.

If you are specifying SEQ as the value for DATA_SPLIT_METHOD, then the data is first sorted smallest to largest based on the specified column. The last n rows are used as evaluation data, where n is the value specified for DATA_SPLIT_EVAL_FRACTION. The remaining rows are used as training data.

If you are specifying CUSTOM as the value for DATA_SPLIT_COL, then you must provide the name of a column of type BOOL. Rows with a value of TRUE or NULLare used as evaluation data, rows with a value of FALSE are used as training data.

The column you specify for DATA_SPLIT_COL can't be used as a feature or label, and the column is excluded from features automatically.

Arguments

A STRING value.

EVALUATION_TASK

Syntax

EVALUATION_TASK = 'evaluation_task'

Description

When performing supervised tuning, the type of task that you want to tune the model to perform. You can only use this option when performing supervised tuning with a supported model. If you specify this option, you must also specify the AS SELECT clause.

Arguments

A STRING value. The valid options are the following:

The default value is UNSPECIFIED.

AS SELECT

Syntax

AS SELECT prompt_column, label_column FROM
  `project_id.dataset.table_name`

Description

Provides the training data to use when performing supervised tuning. You can only use this option when performing supervised tuning with a supported model. If you specify this option, you must also specify the AS SELECT clause.

Arguments

Supervised tuning

If you create a remote model that references any of the following models, you can optionally configure supervised tuning at the same time:

To configure supervised tuning, specify the AS SELECT clause, and optionally some of the other CREATE MODEL arguments that affect supervised tuning. Supervised tuning lets you train the model on your own data to make it better suited for your use case. However, not all models have their performance improved by tuning. To learn more about whether tuning would make sense for your use case, see Use cases for using supervised fine-tuning.

After you create a tuned model, use the EVALUATE function to evaluate whether the tuned model performs well for your use case. To learn more, try the Use tuning and evaluation to improve LLM performance tutorial.

Costs

When using supervised tuning with remote models over Vertex AI LLMs, costs are calculated based on the following:

Locations

For information about supported locations, see Locations for remote models.

Examples

The following examples create BigQuery ML remote models.

Create a model without tuning

The following example creates a BigQuery ML remote model over a Vertex AI model:

CREATE OR REPLACE MODEL `mydataset.flash_model`
  REMOTE WITH CONNECTION `myproject.us.test_connection`
  OPTIONS(ENDPOINT = 'gemini-2.0-flash-001');
Create a tuned model

The following example creates a BigQuery ML remote model over a tuned version of a Vertex AI model:

CREATE OR REPLACE MODEL `mydataset.tuned_model`
  REMOTE WITH CONNECTION `myproject.us.test_connection`
  OPTIONS (
    endpoint = 'gemini-2.0-flash-001',
    max_iterations = 500,
    prompt_col = 'prompt',
    input_label_cols = ['label'])
AS
SELECT
  CONCAT(
    'Please do sentiment analysis on the following text and only output a number from 0 to 5 where 0 means sadness, 1 means joy, 2 means love, 3 means anger, 4 means fear, and 5 means surprise. Text: ',
    sentiment_column) AS prompt,
  text_column AS label
FROM `mydataset.emotion_classification_train`;
Create a partner model that uses the default connection

The following example creates a BigQuery ML remote model over a Mistral AI model:

CREATE OR REPLACE MODEL `mydataset.mistral_model`
REMOTE WITH CONNECTION DEFAULT
OPTIONS(ENDPOINT = 'mistral-large-2411');
What's next

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4