RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://developers.google.com/bigquery/docs/generate-embedding-with-tensorflow-models below:

Embed text with pretrained TensorFlow models | BigQuery

Stay organized with collections Save and categorize content based on your preferences.

Embed text with pretrained TensorFlow models

This tutorial shows you how to generate NNLM, SWIVEL, and BERT text embeddings in BigQuery by using pretrained TensorFlow models. A text embedding is a dense vector representation of a piece of text such that if two pieces of text are semantically similar, then their respective embeddings are close together in the embedding vector space.

The NNLM, SWIVEL, and BERT models

The NNLM, SWIVEL, and BERT models vary in size, accuracy, scalability, and cost. Use the following table to help you determine which model to use:

Model Model size Embedding dimension Use case Description NNLM <150MB 50 Short phrases, news, tweets, reviews Neural Network Language Model SWIVEL <150MB 20 Short phrases, news, tweets, reviews Submatrix-wise Vector Embedding Learner BERT ~200MB 768 Short phrases, news, tweets, reviews, short paragraphs Bidirectional Encoder Representations from Transformers

In this tutorial, the NNLM and SWIVEL models are imported TensorFlow models, and the BERT model is a remote model on Vertex AI.

Required permissions

To create the dataset, you need the bigquery.datasets.create Identity and Access Management (IAM) permission.
To create the bucket, you need the storage.buckets.create IAM permission.
To upload the model to Cloud Storage, you need the storage.objects.create and storage.objects.get IAM permissions.
To create the connection resource, you need the following IAM permissions:
- bigquery.connections.create
- bigquery.connections.get
To load the model into BigQuery ML, you need the following IAM permissions:
- bigquery.jobs.create
- bigquery.models.create
- bigquery.models.getData
- bigquery.models.updateData
To run inference, you need the following IAM permissions:
- bigquery.tables.getData on the object table
- bigquery.models.getData on the model
- bigquery.jobs.create

Costs

In this document, you use the following billable components of Google Cloud:

BigQuery: You incur costs for the queries that you run in BigQuery.
BigQuery ML: You incur costs for the model that you create and the inference that you perform in BigQuery ML.
Cloud Storage: You incur costs for the objects that you store in Cloud Storage.
Vertex AI: If you follow the instructions for generating the BERT model, then you incur costs for deploying the model to an endpoint.

To generate a cost estimate based on your projected usage, use the pricing calculator.

New Google Cloud users might be eligible for a

free trial

For more information, see the following resources:

Before you begin

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.
Go to project selector
Verify that billing is enabled for your Google Cloud project.
Enable the BigQuery, BigQuery Connection, and Vertex AI APIs.

Enable the APIs
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.
Go to project selector
Verify that billing is enabled for your Google Cloud project.
Enable the BigQuery, BigQuery Connection, and Vertex AI APIs.

Enable the APIs

Note: The Vertex AI API and BigQuery Connection API are only required for the BERT model. Create a dataset

To create a dataset named tf_models_tutorial to store the models that you create, select one of the following options:

SQL

Use the CREATE SCHEMA statement:

In the Google Cloud console, go to the BigQuery page.

Go to BigQuery
In the query editor, enter the following statement:
```
CREATE SCHEMA `PROJECT_ID.tf_models_tutorial`;
```
Replace PROJECT_ID with your project ID.
Click play_circle Run.

For more information about how to run queries, see Run an interactive query.

In the Google Cloud console, activate Cloud Shell.

Activate Cloud Shell
To create the dataset, run the bq mk command:
```
bq mk --dataset --location=us PROJECT_ID:tf_models_tutorial
```
Replace PROJECT_ID with your project ID.

Generate and upload a model to Cloud Storage

For more detailed instructions on generating text embeddings using pretrained TensorFlow models, see the Colab notebook. Otherwise, select one of the following models:

NNLM

Install the bigquery-ml-utils library using pip:
```
pip install bigquery-ml-utils
```

Generate an NNLM model. The following Python code loads an NNLM model from TensorFlow Hub and prepares it for BigQuery:

from bigquery_ml_utils import model_generator
import tensorflow_text

# Establish an instance of TextEmbeddingModelGenerator.
text_embedding_model_generator = model_generator.TextEmbeddingModelGenerator()

# Generate an NNLM model.
text_embedding_model_generator.generate_text_embedding_model('nnlm', OUTPUT_MODEL_PATH)

Replace OUTPUT_MODEL_PATH with a path to a local folder where you can temporarily store the model.

Optional: Print the generated model's signature:

import tensorflow as tf

reload_embedding_model = tf.saved_model.load(OUTPUT_MODEL_PATH)
print(reload_embedding_model.signatures["serving_default"])

To copy the generated model from your local folder to a Cloud Storage bucket, use the Google Cloud CLI:
```
gcloud storage cp OUTPUT_MODEL_PATH gs://BUCKET_PATH/nnlm_model --recursive
```
Replace BUCKET_PATH with the name of the Cloud Storage bucket to which you are copying the model.

SWIVEL

Install the bigquery-ml-utils library using pip:
```
pip install bigquery-ml-utils
```

Generate a SWIVEL model. The following Python code loads a SWIVEL model from TensorFlow Hub and prepares it for BigQuery:

from bigquery_ml_utils import model_generator
import tensorflow_text

# Establish an instance of TextEmbeddingModelGenerator.
text_embedding_model_generator = model_generator.TextEmbeddingModelGenerator()

# Generate a SWIVEL model.
text_embedding_model_generator.generate_text_embedding_model('swivel', OUTPUT_MODEL_PATH)

Replace OUTPUT_MODEL_PATH with a path to a local folder where you can temporarily store the model.

Optional: Print the generated model's signature:

import tensorflow as tf

reload_embedding_model = tf.saved_model.load(OUTPUT_MODEL_PATH)
print(reload_embedding_model.signatures["serving_default"])

To copy the generated model from your local folder to a Cloud Storage bucket, use the Google Cloud CLI:
```
gcloud storage cp OUTPUT_MODEL_PATH gs://BUCKET_PATH/swivel_model --recursive
```
Replace BUCKET_PATH with the name of the Cloud Storage bucket to which you are copying the model.

BERT

Install the bigquery-ml-utils library using pip:
```
pip install bigquery-ml-utils
```

Generate a BERT model. The following Python code loads a BERT model from TensorFlow Hub and prepares it for BigQuery:

from bigquery_ml_utils import model_generator
import tensorflow_text

# Establish an instance of TextEmbeddingModelGenerator.
text_embedding_model_generator = model_generator.TextEmbeddingModelGenerator()

# Generate a BERT model.
text_embedding_model_generator.generate_text_embedding_model('bert', OUTPUT_MODEL_PATH)

Replace OUTPUT_MODEL_PATH with a path to a local folder where you can temporarily store the model.

Optional: Print the generated model's signature:

import tensorflow as tf

reload_embedding_model = tf.saved_model.load(OUTPUT_MODEL_PATH)
print(reload_embedding_model.signatures["serving_default"])

To copy the generated model from your local folder to a Cloud Storage bucket, use the Google Cloud CLI:
```
gcloud storage cp OUTPUT_MODEL_PATH gs://BUCKET_PATH/bert_model --recursive
```
Replace BUCKET_PATH with the name of the Cloud Storage bucket to which you are copying the model.

Load the model into BigQuery

Select one of the following models:

NNLM

Use the CREATE MODEL statement:

In the Google Cloud console, go to the BigQuery page.

Go to BigQuery

In the query editor, enter the following statement:

CREATE OR REPLACE MODEL tf_models_tutorial.nnlm_model
OPTIONS (
  model_type = 'TENSORFLOW',
  model_path = 'gs://BUCKET_NAME/nnlm_model/*');

Replace BUCKET_NAME with the name of the bucket that you previously created.

Click play_circle Run.

For more information about how to run queries, see Run an interactive query.

SWIVEL

Use the CREATE MODEL statement:

In the Google Cloud console, go to the BigQuery page.

Go to BigQuery

In the query editor, enter the following statement:

CREATE OR REPLACE MODEL tf_models_tutorial.swivel_model
OPTIONS (
  model_type = 'TENSORFLOW',
  model_path = 'gs://BUCKET_NAME/swivel_model/*');

Replace BUCKET_NAME with the name of the bucket that you previously created.

Click play_circle Run.

For more information about how to run queries, see Run an interactive query.

BERT

To load the BERT model into BigQuery, import the BERT model to Vertex AI, deploy the model to a Vertex AI endpoint, create a connection, and then create a remote model in BigQuery.

To import the BERT model to Vertex AI, follow these steps:

In the Google Cloud console, go to the Vertex AI Model registry page.

Go to Model registry
Click Import, and then do the following:
- For Name, enter BERT.
- For Region, select a region that matches your Cloud Storage bucket's region.
Click Continue, and then do the following:
- For Model framework version, select 2.8.
- For Model artifact location, enter the path to the Cloud Storage bucket where you stored the model file. For example, gs://BUCKET_PATH/bert_model.
Click Import. After the import is complete, your model appears on the Model registry page.

To deploy the BERT model to a Vertex AI endpoint and connect it to BigQuery, follow these steps:

In the Google Cloud console, go to the Vertex AI Model registry page.

Go to Model registry
Click on the name of your model.
Click Deploy & test.
Click Deploy to endpoint.
For Endpoint name, enter bert_model_endpoint.
Click Continue.
Select your compute resources.
Click Deploy.
Create a BigQuery Cloud resource connection and grant access to the connection's service account.

To create a remote model based on the Vertex AI endpoint, use the CREATE MODEL statement:

In the Google Cloud console, go to the BigQuery page.

Go to BigQuery
In the query editor, enter the following statement:
```
CREATE OR REPLACE MODEL tf_models_tutorial.bert_model
INPUT(content STRING)
OUTPUT(embedding ARRAY<FLOAT64>)
REMOTE WITH CONNECTION `PROJECT_ID.CONNECTION_LOCATION.CONNECTION_ID`
OPTIONS (
  ENDPOINT = "https://ENDPOINT_LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/ENDPOINT_LOCATION/endpoints/ENDPOINT_ID");
```
Replace the following:
- PROJECT_ID: the project ID
- CONNECTION_LOCATION: the location of your BigQuery connection
- CONNECTION_ID: the ID of your BigQuery connection
  When you view the connection details in the Google Cloud console, this is the value in the last section of the fully qualified connection ID that is shown in Connection ID, for example projects/myproject/locations/connection_location/connections/myconnection
- ENDPOINT_LOCATION: the location of your Vertex AI endpoint. For example: "us-central1".
- ENDPOINT_ID: the ID of your model endpoint
Click play_circle Run.

For more information about how to run queries, see Run an interactive query.

Generate text embeddings

In this section, you use the ML.PREDICT() inference function to generate text embeddings of the review column from the public dataset bigquery-public-data.imdb.reviews. The query limits the table to 500 rows to reduce the amount of data processed.

NNLM

SELECT
  *
FROM
  ML.PREDICT(
    MODEL `tf_models_tutorial.nnlm_model`,
    (
    SELECT
      review AS content
    FROM
      `bigquery-public-data.imdb.reviews`
    LIMIT
      500)
  );

The result is similar to the following:

+-----------------------+----------------------------------------+
| embedding             | content                                |
+-----------------------+----------------------------------------+
|  0.08599445223808289  | Isabelle Huppert must be one of the... |
| -0.04862852394580841  |                                        |
| -0.017750458791851997 |                                        |
|  0.8658871650695801   |                                        |
| ...                   |                                        |
+-----------------------+----------------------------------------+

SWIVEL

SELECT
  *
FROM
  ML.PREDICT(
    MODEL `tf_models_tutorial.swivel_model`,
    (
    SELECT
      review AS content
    FROM
      `bigquery-public-data.imdb.reviews`
    LIMIT
      500)
  );

The result is similar to the following:

+----------------------+----------------------------------------+
| embedding            | content                                |
+----------------------+----------------------------------------+
|  2.5952553749084473  | Isabelle Huppert must be one of the... |
| -4.015787601470947   |                                        |
|  3.6275434494018555  |                                        |
| -6.045154333114624   |                                        |
| ...                  |                                        |
+----------------------+----------------------------------------+

BERT

SELECT
  *
FROM
  ML.PREDICT(
    MODEL `tf_models_tutorial.bert_model`,
    (
    SELECT
      review AS content
    FROM
      `bigquery-public-data.imdb.reviews`
    LIMIT
      500)
  );

The result is similar to the following:

+--------------+---------------------+----------------------------------------+
| embedding    | remote_model_status | content                                |
+--------------+---------------------+----------------------------------------+
| -0.694072425 | null                | Isabelle Huppert must be one of the... |
|  0.439208865 |                     |                                        |
|  0.99988997  |                     |                                        |
| -0.993487895 |                     |                                        |
| ...          |                     |                                        |
+--------------+---------------------+----------------------------------------+

Clean up

Caution

Everything in the project is deleted. If you used an existing project for the tasks in this document, when you delete it, you also delete any other work you've done in the project.
Custom project IDs are lost. When you created this project, you might have created a custom project ID that you want to use in the future. To preserve the URLs that use the project ID, such as an appspot.com URL, delete selected resources inside the project instead of deleting the whole project.

If you plan to explore multiple architectures, tutorials, or quickstarts, reusing projects can help you avoid exceeding project quota limits.

In the Google Cloud console, go to the Manage resources page.
Go to Manage resources
In the project list, select the project that you want to delete, and then click Delete.
In the dialog, type the project ID, and then click Shut down to delete the project.

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-08-07 UTC.

[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-07 UTC."],[[["This tutorial demonstrates how to generate text embeddings in BigQuery using pretrained TensorFlow models, specifically NNLM, SWIVEL, and BERT, which vary in size, accuracy, scalability, and cost."],["NNLM and SWIVEL models are imported TensorFlow models, while the BERT model is a remote model on Vertex AI, and the tutorial provides detailed steps for generating and using each."],["The tutorial outlines the necessary IAM permissions required to create datasets, buckets, and connection resources, as well as to upload, load, and run inference on the models in BigQuery."],["You will incur costs for using BigQuery, BigQuery ML, Cloud Storage, and Vertex AI, and you can estimate these costs with the pricing calculator."],["The `ML.PREDICT()` function is used to generate text embeddings from a public dataset, and the results showcase the various output formats of the NNLM, SWIVEL, and BERT models."]]],[]]

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4