RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from http://cloud.google.com/vertex-ai/generative-ai/docs/embeddings/batch-prediction-genai-embeddings below:

Get batch text embeddings predictions | Generative AI on Vertex AI

Skip to main content Get batch text embeddings predictions

Stay organized with collections Save and categorize content based on your preferences.

This guide shows you how to get batch predictions for text embeddings.

Prepare your inputs: Learn how to format your input data using either JSONL files in Cloud Storage or a BigQuery table.
Request a batch response: Submit a batch prediction job to the model.
Retrieve batch output: Access the results of your completed batch job.

Batch predictions are a good option for large volumes of non-latency-sensitive embeddings requests. Key features of batch predictions include:

Large volume: Process a large number of requests in a single batch job instead of one at a time.
Asynchronous processing: Similar to batch prediction for tabular data in Vertex AI, you specify an output location for your results, and the job populates it asynchronously.

Text embeddings models that support batch predictions

All stable versions of text embedding models support batch predictions. Stable versions are versions that are no longer in preview and are fully supported for production environments. To see the full list of supported embedding models, see Embedding model and versions.

Choose an input source

Before you prepare your inputs, decide whether to use JSONL files in Cloud Storage or a BigQuery table. The following table provides a comparison to help you choose the best option for your use case.

Input Source Description Use Case JSONL file in Cloud Storage A text file where each line is a separate JSON object that contains a prompt. Use this option when your source data is in files or if you prefer a file-based data pipeline. BigQuery table A structured table in BigQuery with a column that contains the prompts. Use this option when your prompts are stored in BigQuery or are part of a larger structured dataset. Prepare your inputs

The input for batch requests is a list of prompts stored in either a BigQuery table or a JSON Lines (JSONL) file in Cloud Storage. Each batch request can include up to 30,000 prompts.

JSONL format

Input example

Each line in the input file must be a valid JSON object with a content field that contains the prompt.

{"content":"Give a short description of a machine learning model:"}
{"content":"Best recipe for banana bread:"}

Output example

The output is written to a JSONL file where each line contains the instance, the corresponding prediction, and a status.

{"instance":{"content":"Give..."},"predictions": [{"embeddings":{"statistics":{"token_count":8,"truncated":false},"values":[0.2,....]}}],"status":""}
{"instance":{"content":"Best..."},"predictions": [{"embeddings":{"statistics":{"token_count":3,"truncated":false},"values":[0.1,....]}}],"status":""}

BigQuery example

This section shows examples of how to format BigQuery input and output.

BigQuery input example

This example shows a single column BigQuery table.

content "Give a short description of a machine learning model:" "Best recipe for banana bread:" BigQuery output example content predictions status "Give a short description of a machine learning model:"

'[{"embeddings":
    { "statistics":{"token_count":8,"truncated":false},
      "Values":[0.1,....]
    }
  }
]'

"Best recipe for banana bread:"

'[{"embeddings":
    { "statistics":{"token_count":3,"truncated":false},
      "Values":[0.2,....]
    }
  }
]'

Request a batch response

Depending on the number of input items you submit, a batch generation task can take some time to complete.

REST

To test a text prompt by using the Vertex AI API, send a POST request to the publisher model endpoint.

Before using any of the request data, make the following replacements:

PROJECT_ID: The ID of your Google Cloud project.
BP_JOB_NAME: The job name.
INPUT_URI: The input source URI. This is either a BigQuery table URI or a JSONL file URI in Cloud Storage.
OUTPUT_URI: Output target URI.

HTTP method and URL:

POST https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs

Request JSON body:

{
    "name": "BP_JOB_NAME",
    "displayName": "BP_JOB_NAME",
    "model": "publishers/google/models/textembedding-gecko",
    "inputConfig": {
      "instancesFormat":"bigquery",
      "bigquerySource":{
        "inputUri" : "INPUT_URI"
      }
    },
    "outputConfig": {
      "predictionsFormat":"bigquery",
      "bigqueryDestination":{
        "outputUri": "OUTPUT_URI"
    }
  }
}

To send your request, choose one of these options:

curl Note: The following command assumes that you have logged in to the gcloud CLI with your user account by running gcloud init or gcloud auth login , or by using Cloud Shell, which automatically logs you into the gcloud CLI . You can check the currently active account by running gcloud auth list.

Save the request body in a file named request.json, and execute the following command:

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs"

PowerShell Note: The following command assumes that you have logged in to the gcloud CLI with your user account by running gcloud init or gcloud auth login . You can check the currently active account by running gcloud auth list.

Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `

-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs" | Select-Object -Expand Content

You should receive a JSON response similar to the following:

{
  "name": "projects/123456789012/locations/us-central1/batchPredictionJobs/1234567890123456789",
  "displayName": "BP_sample_publisher_BQ_20230712_134650",
  "model": "projects/{PROJECT_ID}/locations/us-central1/models/textembedding-gecko",
  "inputConfig": {
    "instancesFormat": "bigquery",
    "bigquerySource": {
      "inputUri": "bq://project_name.dataset_name.text_input"
    }
  },
  "modelParameters": {},
  "outputConfig": {
    "predictionsFormat": "bigquery",
    "bigqueryDestination": {
      "outputUri": "bq://project_name.llm_dataset.embedding_out_BP_sample_publisher_BQ_20230712_134650"
    }
  },
  "state": "JOB_STATE_PENDING",
  "createTime": "2023-07-12T20:46:52.148717Z",
  "updateTime": "2023-07-12T20:46:52.148717Z",
  "labels": {
    "owner": "sample_owner",
    "product": "llm"
  },
  "modelVersionId": "1",
  "modelMonitoringStatus": {}
}

The response includes a unique identifier for the batch job. You can poll for the status of the batch job using the BATCH_JOB_ID until the job state is JOB_STATE_SUCCEEDED. For example:

curl \
  -X GET \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs/BATCH_JOB_ID

Note: You can run only one batch response job at a time. Custom Service accounts, live progress, CMEK, and VPC-SC reports aren't supported at this time. Python Install

pip install --upgrade google-genai

To learn more, see the SDK reference documentation.

Set environment variables to use the Gen AI SDK with Vertex AI:

# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values
# with appropriate values for your project.
export GOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECT
export GOOGLE_CLOUD_LOCATION=us-central1
export GOOGLE_GENAI_USE_VERTEXAI=True

Retrieve batch output

When a batch prediction task is complete, the output is stored in the Cloud Storage bucket or BigQuery table that you specified in your request.

What's next

Learn how to get text embeddings.

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-08-15 UTC.

[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-15 UTC."],[],[]]

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4