A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://developers.google.com/bigquery/docs/generate-multimodal-embeddings below:

Generate and search multimodal embeddings | BigQuery

Stay organized with collections Save and categorize content based on your preferences.

Generate and search multimodal embeddings

This tutorial shows how to generate multimodal embeddings for images and text using BigQuery and Vertex AI, and then use these embeddings to perform a text-to-image semantic search.

This tutorial covers the following tasks:

This tutorial uses the public domain art images from The Metropolitan Museum of Art that are available in the public Cloud Storage gcs-public-data--met bucket.

Required roles

To run this tutorial, you need the following Identity and Access Management (IAM) roles:

These predefined roles contain the permissions required to perform the tasks in this document. To see the exact permissions that are required, expand the Required permissions section:

Required permissions

You might also be able to get these permissions with custom roles or other predefined roles.

Costs

In this document, you use the following billable components of Google Cloud:

To generate a cost estimate based on your projected usage, use the pricing calculator.

New Google Cloud users might be eligible for a

free trial

.

For more information about BigQuery pricing, see BigQuery pricing in the BigQuery documentation.

For more information about Vertex AI pricing, see the Vertex AI pricing page.

Before you begin
  1. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

    Go to project selector

  2. Verify that billing is enabled for your Google Cloud project.

  3. Enable the BigQuery, BigQuery Connection, and Vertex AI APIs.

    Enable the APIs

Create a dataset

Create a BigQuery dataset to store your ML model.

Console
  1. In the Google Cloud console, go to the BigQuery page.

    Go to the BigQuery page

  2. In the Explorer pane, click your project name.

  3. Click more_vert View actions > Create dataset.

  4. On the Create dataset page, do the following:

bq

To create a new dataset, use the bq mk command with the --location flag. For a full list of possible parameters, see the bq mk --dataset command reference.

  1. Create a dataset named bqml_tutorial with the data location set to US and a description of BigQuery ML tutorial dataset:

    bq --location=US mk -d \
     --description "BigQuery ML tutorial dataset." \
     bqml_tutorial

    Instead of using the --dataset flag, the command uses the -d shortcut. If you omit -d and --dataset, the command defaults to creating a dataset.

  2. Confirm that the dataset was created:

    bq ls
API

Call the datasets.insert method with a defined dataset resource.

{
  "datasetReference": {
     "datasetId": "bqml_tutorial"
  }
}
BigQuery DataFrames

Before trying this sample, follow the BigQuery DataFrames setup instructions in the BigQuery quickstart using BigQuery DataFrames. For more information, see the BigQuery DataFrames reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up ADC for a local development environment.

Create the object table

Create an object table over the art images in the public Cloud Storage gcs-public-data--met bucket. The object table makes it possible to analyze the images without moving them from Cloud Storage.

  1. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  2. In the query editor, run the following query:

    CREATE OR REPLACE EXTERNAL TABLE `bqml_tutorial.met_images`
    WITH CONNECTION DEFAULT
    OPTIONS
      ( object_metadata = 'SIMPLE',
        uris = ['gs://gcs-public-data--met/*']
      );
Explore the image data

Create a Colab Enterprise notebook in BigQuery to explore the image data.

  1. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  2. Create a notebook by using the BigQuery editor.

  3. Connect the notebook to the default runtime.

  4. Set up the notebook:

    1. Add a code cell to the notebook.
    2. Copy and paste the following code into the code cell:

      #@title Set up credentials
      
      from google.colab import auth
      auth.authenticate_user()
      print('Authenticated')
      
      PROJECT_ID='PROJECT_ID'
      from google.cloud import bigquery
      client = bigquery.Client(PROJECT_ID)
      

      Replace PROJECT_ID with the name of the project that you are using for this tutorial.

    3. Run the code cell.

  5. Enable table display:

    1. Add a code cell to the notebook.
    2. Copy and paste the following code into the code cell:

      #@title Enable data table display
      %load_ext google.colab.data_table
      
    3. Run the code cell.

  6. Create a function to display the images:

    1. Add a code cell to the notebook.
    2. Copy and paste the following code into the code cell:

      #@title Util function to display images
      import io
      from PIL import Image
      import matplotlib.pyplot as plt
      import tensorflow as tf
      
      def printImages(results):
       image_results_list = list(results)
       amt_of_images = len(image_results_list)
      
       fig, axes = plt.subplots(nrows=amt_of_images, ncols=2, figsize=(20, 20))
       fig.tight_layout()
       fig.subplots_adjust(hspace=0.5)
       for i in range(amt_of_images):
         gcs_uri = image_results_list[i][0]
         text = image_results_list[i][1]
         f = tf.io.gfile.GFile(gcs_uri, 'rb')
         stream = io.BytesIO(f.read())
         img = Image.open(stream)
         axes[i, 0].axis('off')
         axes[i, 0].imshow(img)
         axes[i, 1].axis('off')
         axes[i, 1].text(0, 0, text, fontsize=10)
       plt.show()
      
    3. Run the code cell.

  7. Display the images:

    1. Add a code cell to the notebook.
    2. Copy and paste the following code into the code cell:

      #@title Display Met images
      
      inspect_obj_table_query = """
      SELECT uri, content_type
      FROM bqml_tutorial.met_images
      WHERE content_type = 'image/jpeg'
      Order by uri
      LIMIT 10;
      """
      printImages(client.query(inspect_obj_table_query))
      
    3. Run the code cell.

      The results should look similar to the following:

  8. Save the notebook as met-image-analysis.

Create the remote model

Create a remote model that represents a hosted Vertex AI multimodal embedding model:

  1. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  2. In the query editor, run the following query:

    CREATE OR REPLACE MODEL `bqml_tutorial.multimodal_embedding_model`
      REMOTE WITH CONNECTION DEFAULT
      OPTIONS (ENDPOINT = 'multimodalembedding@001');

    The query takes several seconds to complete, after which the multimodal_embedding_model model appears in the bqml_tutorial dataset in the Explorer pane. Because the query uses a CREATE MODEL statement to create a model, there are no query results.

Generate image embeddings

Generate embeddings from the images in the object table by using the ML.GENERATE_EMBEDDING function, and then write them to a table for use in a following step. Embedding generation is an expensive operation, so the query uses a subquery including the LIMIT clause to limit embedding generation to 10,000 images instead of embedding the full dataset of 601,294 images. This also helps keep the number of images under the 25,000 limit for the ML.GENERATE_EMBEDDING function. This query takes approximately 40 minutes to run.

  1. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  2. In the query editor, run the following query:

    CREATE OR REPLACE TABLE `bqml_tutorial.met_image_embeddings`
    AS
    SELECT *
    FROM
      ML.GENERATE_EMBEDDING(
        MODEL `bqml_tutorial.multimodal_embedding_model`,
        (SELECT * FROM `bqml_tutorial.met_images` WHERE content_type = 'image/jpeg' LIMIT 10000))
Correct any embedding generation errors

Check for and correct any embedding generation errors. Embedding generation can fail because of Generative AI on Vertex AI quotas or service unavailability.

The ML.GENERATE_EMBEDDING function returns error details in the ml_generate_embedding_status column. This column is empty if embedding generation was successful, or contains an error message if embedding generation failed.

  1. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  2. In the query editor, run the following query to see if there were any embedding generation failures:

    SELECT DISTINCT(ml_generate_embedding_status),
      COUNT(uri) AS num_rows
    FROM bqml_tutorial.met_image_embeddings
    GROUP BY 1;
  3. If rows with errors are returned, drop any rows where embedding generation failed:

    DELETE FROM `bqml_tutorial.met_image_embeddings`
    WHERE ml_generate_embedding_status = 'A retryable error occurred: RESOURCE_EXHAUSTED error from remote service/endpoint.';
Create a vector index

You can optionally use the CREATE VECTOR INDEX statement to create the met_images_index vector index on the ml_generate_embedding_result column of the met_images_embeddings table. A vector index lets you perform a vector search more quickly, with the trade-off of reducing recall and so returning more approximate results.

  1. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  2. In the query editor, run the following query:

    CREATE OR REPLACE
      VECTOR INDEX `met_images_index`
    ON
      bqml_tutorial.met_image_embeddings(ml_generate_embedding_result)
      OPTIONS (
        index_type = 'IVF',
        distance_type = 'COSINE');
  3. The vector index is created asynchronously. To check if the vector index has been created, query the INFORMATION_SCHEMA.VECTOR_INDEXES view and confirm that the coverage_percentage value is greater than 0, and the last_refresh_time value isn't NULL:

    SELECT table_name, index_name, index_status,
      coverage_percentage, last_refresh_time, disable_reason
    FROM bqml_tutorial.INFORMATION_SCHEMA.VECTOR_INDEXES
    WHERE index_name = 'met_images_index';
Generate an embedding for the search text

To search images that correspond to a specified text search string, you must first create a text embedding for that string. Use the same remote model to create the text embedding that you used to create the image embeddings, and then write the text embedding to a table for use in a following step. The search string is pictures of white or cream colored dress from victorian era.

  1. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  2. In the query editor, run the following query:

    CREATE OR REPLACE TABLE `bqml_tutorial.search_embedding`
    AS
    SELECT * FROM ML.GENERATE_EMBEDDING(
      MODEL `bqml_tutorial.multimodal_embedding_model`,
      (
        SELECT 'pictures of white or cream colored dress from victorian era' AS content
      )
    );
Perform a text-to-image semantic search

Use the VECTOR_SEARCH function to perform a semantic search for images that best correspond to the search string represented by the text embedding.

  1. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  2. In the query editor, run the following query to perform a semantic search and write the results to a table:

    CREATE OR REPLACE TABLE `bqml_tutorial.vector_search_results` AS
    SELECT base.uri AS gcs_uri, distance
    FROM
      VECTOR_SEARCH(
        TABLE `bqml_tutorial.met_image_embeddings`,
        'ml_generate_embedding_result',
        TABLE `bqml_tutorial.search_embedding`,
        'ml_generate_embedding_result',
        top_k => 3);
Visualize the semantic search results

Visualize the semantic search results by using a notebook.

  1. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  2. Open the met-image-analysis notebook that you created earlier.

  3. Visualize the vector search results:

    1. Add a code cell to the notebook.
    2. Copy and paste the following code into the code cell:

      query = """
        SELECT * FROM `bqml_tutorial.vector_search_results`
        ORDER BY distance;
      """
      
      printImages(client.query(query))
      
    3. Run the code cell.

      The results should look similar to the following:

Clean up
    Caution: Deleting a project has the following effects:

    If you plan to explore multiple architectures, tutorials, or quickstarts, reusing projects can help you avoid exceeding project quota limits.

  1. In the Google Cloud console, go to the Manage resources page.

    Go to Manage resources

  2. In the project list, select the project that you want to delete, and then click Delete.
  3. In the dialog, type the project ID, and then click Shut down to delete the project.

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-08-07 UTC.

[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-07 UTC."],[[["This tutorial demonstrates generating multimodal embeddings for images and text using BigQuery and Vertex AI."],["It covers creating a BigQuery object table over image data in Cloud Storage and using a remote model to generate embeddings."],["The process includes creating both image and text embeddings, handling potential generation errors, and optionally creating a vector index."],["It explains how to perform a cross-modality text-to-image semantic search using the `VECTOR_SEARCH` function and visualize the results."],["The tutorial uses the public domain art images from The Metropolitan Museum of Art, available in the public Cloud Storage bucket `gcs-public-data--met`."]]],[]]


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4