A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from http://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-generate-embedding below:

The ML.GENERATE_EMBEDDING function | BigQuery

Skip to main content

Stay organized with collections Save and categorize content based on your preferences.

The ML.GENERATE_EMBEDDING function

This document describes the ML.GENERATE_EMBEDDING function, which lets you create embeddings that describe an entity—for example, a piece of text or an image.

You can create embeddings for the following types of data:

Embeddings

Embeddings are high-dimensional numerical vectors that represent a given entity. Machine learning (ML) models use embeddings to encode semantics about entities to make it easier to reason about and compare them. If two entities are semantically similar, then their respective embeddings are located near each other in the embedding vector space.

Embeddings help you perform the following tasks:

Function processing

Depending on the task, the ML.GENERATE_EMBEDDING function works in one of the following ways:

Syntax

ML.GENERATE_EMBEDDING syntax differs depending on the BigQuery ML model you choose. If you use a remote model, it also differs depending on the Vertex AI model that your remote models targets. Choose the option appropriate for your use case.

multimodalembedding
# Syntax for standard tables
ML.GENERATE_EMBEDDING(
  MODEL `PROJECT_ID.DATASET.MODEL_NAME`,
  { TABLE `PROJECT_ID.DATASET.TABLE_NAME` | (QUERY_STATEMENT) },
  STRUCT(
    [FLATTEN_JSON_OUTPUT AS flatten_json_output]
    [, OUTPUT_DIMENSIONALITY AS output_dimensionality])
)
# Syntax for object tables
ML.GENERATE_EMBEDDING(
  MODEL `PROJECT_ID.DATASET.MODEL_NAME`,
  { TABLE `PROJECT_ID.DATASET.TABLE_NAME` | (QUERY_STATEMENT) },
  STRUCT(
    [FLATTEN_JSON_OUTPUT AS flatten_json_output]
    [, START_SECOND AS start_second]
    [, END_SECOND AS end_second]
    [, INTERVAL_SECONDS AS interval_seconds]
    [, OUTPUT_DIMENSIONALITY AS output_dimensionality])
)
Arguments

ML.GENERATE_EMBEDDING takes the following arguments:

Details

The model and input table must be in the same region.

text-embedding
ML.GENERATE_EMBEDDING(
  MODEL `PROJECT_ID.DATASET.MODEL_NAME`,
  { TABLE `PROJECT_ID.DATASET.TABLE_NAME` | (QUERY_STATEMENT) },
  STRUCT(
    [FLATTEN_JSON_OUTPUT AS flatten_json_output]
    [, TASK_TYPE AS task_type]
    [, OUTPUT_DIMENSIONALITY AS output_dimensionality])
)
Arguments

ML.GENERATE_EMBEDDING takes the following arguments:

Details

The model and input table must be in the same region.

text-multilingual-embedding
ML.GENERATE_EMBEDDING(
  MODEL `PROJECT_ID.DATASET.MODEL_NAME`,
  { TABLE `PROJECT_ID.DATASET.TABLE_NAME` | (QUERY_STATEMENT) },
  STRUCT(
    [FLATTEN_JSON_OUTPUT AS flatten_json_output]
    [, TASK_TYPE AS task_type]
    [, OUTPUT_DIMENSIONALITY AS output_dimensionality])
)
Arguments

ML.GENERATE_EMBEDDING takes the following arguments:

Details

The model and input table must be in the same region.

PCA
ML.GENERATE_EMBEDDING(
  MODEL `PROJECT_ID.DATASET.MODEL_NAME`,
  { TABLE `PROJECT_ID.DATASET.TABLE_NAME` | (QUERY_STATEMENT) }
)
Arguments

ML.GENERATE_EMBEDDING takes the following arguments:

Details

The model and input table must be in the same region.

Autoencoder
ML.GENERATE_EMBEDDING(
  MODEL `PROJECT_ID.DATASET.MODEL_NAME`,
  { TABLE `PROJECT_ID.DATASET.TABLE_NAME` | (QUERY_STATEMENT) },
  STRUCT([TRIAL_ID AS trial_id])
)
Arguments

ML.GENERATE_EMBEDDING takes the following arguments:

Details

The model and input table must be in the same region.

Matrix factorization
ML.GENERATE_EMBEDDING(
  MODEL `PROJECT_ID.DATASET.MODEL_NAME`,
  STRUCT([TRIAL_ID AS trial_id])
)
Arguments

ML.GENERATE_EMBEDDING takes the following arguments:

Output multimodalembedding

ML.GENERATE_EMBEDDING returns the input table and the following columns:

text-embedding

ML.GENERATE_EMBEDDING returns the input table and the following columns:

text-multilingual-embedding

ML.GENERATE_EMBEDDING returns the input table and the following columns:

PCA

ML.GENERATE_EMBEDDING returns the input table and the following column:

Autoencoder

ML.GENERATE_EMBEDDING returns the input table and the following column:

Matrix factorization

ML.GENERATE_EMBEDDING returns the following columns:

Supported visual content

You can use the ML.GENERATE_EMBEDDING function to generate embeddings for videos and images that meet the requirements described in API limits.

There is no limitation on the length of the video files you can use with this function. However, the function only processes the first two minutes of a video. If a video is longer than two minutes, the ML.GENERATE_EMBEDDING function only returns embeddings for the first two minutes.

Known issues

Sometimes after a query job that uses this function finishes successfully, some returned rows contain the following error message:

A retryable error occurred: RESOURCE EXHAUSTED error from <remote endpoint>

This issue occurs because BigQuery query jobs finish successfully even if the function fails for some of the rows. The function fails when the volume of API calls to the remote endpoint exceeds the quota limits for that service. This issue occurs most often when you are running multiple parallel batch queries. BigQuery retries these calls, but if the retries fail, the resource exhausted error message is returned.

To iterate through inference calls until all rows are successfully processed, you can use the BigQuery remote inference SQL scripts or the BigQuery remote inference pipeline Dataform package.

Examples multimodalembedding

This example shows how to generate embeddings from visual content by using a remote model that references a multimodalembedding model.

Create the remote model:

CREATE OR REPLACE MODEL `mydataset.multimodalembedding`
  REMOTE WITH CONNECTION `us.test_connection`
  OPTIONS(ENDPOINT = 'multimodalembedding@001')

Use an ObjectRefRuntime value

Generate embeddings from visual content in an ObjectRef column in a standard table:

SELECT *
FROM ML.GENERATE_EMBEDDING(
  MODEL `mydataset.multimodalembedding`,
    (
      SELECT OBJ.GET_ACCESS_URL(art_image, 'r') as content
      FROM `mydataset.art`)
 );

Use an object table

Generate embeddings from visual content in an object table:

SELECT *
FROM ML.GENERATE_EMBEDDING(
  MODEL `mydataset.multimodalembedding`,
  TABLE `mydataset.my_object_table`);
text-embedding

This example shows how to generate an embedding of a single piece of sample text by using a remote model that references a text-embedding model.

Create the remote model:

CREATE OR REPLACE MODEL `mydataset.text_embedding`
  REMOTE WITH CONNECTION `us.test_connection`
  OPTIONS(ENDPOINT = 'text-embedding-005')

Generate the embedding:

SELECT *
FROM
  ML.GENERATE_EMBEDDING(
    MODEL `mydataset.text_embedding`,
    (SELECT "Example text to embed" AS content),
    STRUCT(TRUE AS flatten_json_output)
);
text-multilingual-embedding

This example shows how to generate embeddings from a table and specify a task type by using a remote model that references a text-multilingual-embedding model.

Create the remote model:

CREATE OR REPLACE MODEL `mydataset.text_multi`
  REMOTE WITH CONNECTION `us.test_connection`
  OPTIONS(ENDPOINT = 'text-multilingual-embedding-002')

Generate the embeddings:

SELECT *
FROM
  ML.GENERATE_EMBEDDING(
    MODEL `mydataset.text_multi`,
    TABLE `mydataset.customer_feedback`,
    STRUCT(TRUE AS flatten_json_output, 'SEMANTIC_SIMILARITY' as task_type)
);
PCA

This example shows how to generate embeddings that represent the principal components of a PCA model.

Create the PCA model:

CREATE OR REPLACE MODEL `mydataset.pca_nyc_trees`
  OPTIONS (
    MODEL_TYPE = 'PCA',
    PCA_EXPLAINED_VARIANCE_RATIO = 0.9)
AS (
  SELECT
    tree_id,
    block_id,
    tree_dbh,
    stump_diam,
    curb_loc,
    status,
    health,
    spc_latin
  FROM
    `bigquery-public-data.new_york_trees.tree_census_2015`
);

Generate embeddings that represent principal components:

SELECT *
FROM
  ML.GENERATE_EMBEDDING(
    MODEL `mydataset.pca_nyc_trees`,
(
  SELECT
    tree_id,
    block_id,
    tree_dbh,
    stump_diam,
    curb_loc,
    status,
    health,
    spc_latin
  FROM
    `bigquery-public-data.new_york_trees.tree_census_2015`
));
Autoencoder

This example shows how to generate embeddings that represent the latent space dimensions of an autoencoder model.

Create the autoencoder model:

CREATE OR REPLACE MODEL `mydataset.my_autoencoder_model`
  OPTIONS (
    model_type = 'autoencoder',
    activation_fn = 'relu',
    batch_size = 8,
    dropout = 0.2,
    hidden_units =
      [
        32,
        16,
        4,
        16,
        32],
    learn_rate = 0.001,
    l1_reg_activation = 0.0001,
    max_iterations = 10,
    optimizer = 'adam')
AS
SELECT * EXCEPT (
    Time,
    Class)
FROM
  `bigquery-public-data.ml_datasets.ulb_fraud_detection`;

Generate embeddings that represent latent space dimensions:

SELECT
  *
FROM
  ML.GENERATE_EMBEDDING(
    MODEL `mydataset.my_autoencoder_model`,
    TABLE `bigquery-public-data.ml_datasets.ulb_fraud_detection`);
Matrix factorization

This example shows how to generate embeddings that represent the underlying weights that the matrix factorization model uses during prediction.

Create the matrix factorization model:

CREATE OR REPLACE MODEL
  `mydataset.my_mf_model`
OPTIONS (
  model_type='matrix_factorization',
  user_col='user_id',
  item_col='item_id',
  l2_reg=9.83,
  num_factors=34)
AS SELECT
  user_id,
  item_id,
  AVG(rating) as rating
FROM
  movielens.movielens_1m
GROUP BY user_id, item_id;

Generate embeddings that represent model weights and intercepts:

SELECT
  *
FROM
  ML.GENERATE_EMBEDDING(MODEL `mydataset.my_mf_model`)
Locations

The ML.GENERATE_EMBEDDING function must run in the same region or multi-region as the model that the function references. For more information on supported regions for embedding models, see Google model endpoint locations. Embedding models are also available in the US multi-region.

Quotas

Quotas apply when you use the ML.GENERATE_EMBEDDING function with remote models. For more information, see Vertex AI and Cloud AI service functions quotas and limits.

For the multimodalembedding model, the default requests per minute (RPM) for non-EU regions is 600. The default RPM for EU regions is 120. However, you can request a quota increase in order to increase throughput.

To increase quota, first request more quota for the Vertex AI multimodalembedding model by using the process described in Manage your quota using the console. When the model quota has been increased, send an email to bqml-feedback@google.com and request a quota increase for the ML.GENERATE_EMBEDDING function. Include information about the adjusted multimodalembedding quota.

What's next

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-08-14 UTC.

[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-14 UTC."],[[["The `ML.GENERATE_EMBEDDING` function creates high-dimensional numerical vectors, known as embeddings, that represent entities like text or images, which are used by machine learning models to encode semantic information and compare entities."],["This function can be used for tasks such as semantic search, recommendation, classification, clustering, outlier detection, matrix factorization, principal component analysis (PCA), and autoencoding."],["Depending on the task, `ML.GENERATE_EMBEDDING` either uses a BigQuery ML remote model that represents a Vertex AI embedding model for text and visual content, or it utilizes BigQuery ML's PCA, autoencoder, or matrix factorization models for tasks like dimensionality reduction and feature representation."],["The function's syntax and arguments vary depending on the type of BigQuery ML model used and, in the case of remote models, the specific Vertex AI model being targeted, with different parameters for handling text, image, or video content."],["The `ML.GENERATE_EMBEDDING` function returns embeddings along with additional data, such as a status column indicating the API response status, video start/end times, and statistics like token count, for which the format and the content varies depending on whether the input is multimodal, text-only, PCA, autoencoder, or matrix factorization."]]],[]]


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4