Stay organized with collections Save and categorize content based on your preferences.
This page shows you how to make a batch inference request to your trained AutoML classification or regression model using the Google Cloud console or the Vertex AI API.
A batch inference request is an asynchronous request (as opposed to online inference, which is a synchronous request). Request batch inferences directly from the model resource without deploying the model to an endpoint. For tabular data, use batch inferences when you don't require an immediate response and want to process accumulated data by using a single request.
To make a batch inference request, specify an input source and an output format where Vertex AI stores inference results.
Note: To minimize processing time when you use the Google Cloud console to create batch inferences, select input and output locations that are in the same region as your model. If you use the API to create batch inferences, send requests to a service endpoint (such ashttps://us-central1-aiplatform.googleapis.com
) that is in the same region or geographically close to your input and output locations. Before you begin
Before you make a batch inference request, you must first train a model.
Input dataThe input data for batch inference requests is the data that your model uses to make inferences. For classification or regression models, you can provide input data in one of the following formats:
We recommend that you use the same format for your input data as you used for training the model. For example, if you trained your model using data in BigQuery, it is best to use a BigQuery table as the input for your batch inference. Because Vertex AI treats all CSV input fields as strings, mixing training and input data formats may cause errors.
Your data source must contain tabular data that includes all of the columns, in any order, that were used to train the model. You can include columns that were not in the training data, or that were in the training data but excluded from use for training. These extra columns are included in the output but don't affect the inference results.
Input data requirements BigQuery tableIf you choose a BigQuery table as the input, you must ensure the following:
BigQuery Data Editor
role to the Vertex AI service account in that project.If you choose a CSV object in Cloud Storage as the input, you must ensure the following:
Storage Object Creator
role to the Vertex AI service account in that project.The output format of your batch inference request doesn't need to be the same as the input format. For example, if you used BigQuery table as the input, you can output the results to a CSV object in Cloud Storage.
Make a batch inference request to your modelTo make batch inference requests, you can use the Google Cloud console or the Vertex AI API. The input data source can be CSV objects stored in a Cloud Storage bucket or BigQuery tables. Depending on the amount of data that you submit as input, a batch inference task can take some time to complete.
Google Cloud consoleUse the Google Cloud console to request a batch prediction.
bq://projectid.datasetid
.To enable feature attributions, select Enable feature attributions for this model. This option is available if your output destination is BigQuery or JSONL on Cloud Storage. Feature attributions are not supported for CSV on Cloud Storage.
You use the batchPredictionJobs.create method to request a batch prediction.
Before using any of the request data, make the following replacements:
us-central1
.bq://bqprojectId.bqDatasetId.bqTableId
bq://bqprojectId.bqDatasetId
Default value is false. Set to true to enable feature attributions.
HTTP method and URL:
POST https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/batchPredictionJobs
Request JSON body:
{ "displayName": "BATCH_JOB_NAME", "model": "MODEL_ID", "inputConfig": { "instancesFormat": "bigquery", "bigquerySource": { "inputUri": "INPUT_URI" } }, "outputConfig": { "predictionsFormat": "bigquery", "bigqueryDestination": { "outputUri": "OUTPUT_URI" } }, "dedicatedResources": { "machineSpec": { "machineType": "MACHINE_TYPE", "acceleratorCount": "0" }, "startingReplicaCount": STARTING_REPLICA_COUNT, "maxReplicaCount": MAX_REPLICA_COUNT }, "generateExplanation": GENERATE_EXPLANATION }
To send your request, choose one of these options:
curl Note: The following command assumes that you have logged in to thegcloud
CLI with your user account by running gcloud init
or gcloud auth login
, or by using Cloud Shell, which automatically logs you into the gcloud
CLI . You can check the currently active account by running gcloud auth list
.
Save the request body in a file named request.json
, and execute the following command:
curl -X POST \PowerShell Note: The following command assumes that you have logged in to the
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/batchPredictionJobs"
gcloud
CLI with your user account by running gcloud init
or gcloud auth login
. You can check the currently active account by running gcloud auth list
.
Save the request body in a file named request.json
, and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/batchPredictionJobs" | Select-Object -Expand Content
You should receive a JSON response similar to the following:
{ "name": "projects/PROJECT_ID/locations/LOCATION_ID/batchPredictionJobs/67890", "displayName": "batch_job_1 202005291958", "model": "projects/12345/locations/us-central1/models/5678", "state": "JOB_STATE_PENDING", "inputConfig": { "instancesFormat": "bigquery", "bigquerySource": { "inputUri": "INPUT_URI" } }, "outputConfig": { "predictionsFormat": "bigquery", "bigqueryDestination": { "outputUri": bq://12345 } }, "dedicatedResources": { "machineSpec": { "machineType": "n1-standard-32", "acceleratorCount": "0" }, "startingReplicaCount": 2, "maxReplicaCount": 6 }, "manualBatchTuningParameters": { "batchSize": 4 }, "generateExplanation": false, "outputInfo": { "bigqueryOutputDataset": "bq://12345.reg_model_2020_10_02_06_04 } "state": "JOB_STATE_PENDING", "createTime": "2020-09-30T02:58:44.341643Z", "updateTime": "2020-09-30T02:58:44.341643Z", }Java
Before trying this sample, follow the Java setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Java API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
In the following sample, replace INSTANCES_FORMAT and PREDICTIONS_FORMAT with `bigquery`. To learn how to replace the other placeholders, see the `REST & CMD LINE` tab of this section. PythonTo learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
In the following sample, set the `instances_format` and `predictions_format` parameters to `"bigquery"`. To learn how to set the other parameters, see the `REST & CMD LINE` tab of this section. API: Cloud Storage RESTYou use the batchPredictionJobs.create method to request a batch inference.
Before using any of the request data, make the following replacements:
us-central1
.gs://bucketName/pathToFileName
gs://bucketName/pathToOutputDirectory
Default value is false. Set to true to enable feature attributions. This option is available only if your output destination is JSONL. Feature attributions are not supported for CSV on Cloud Storage.
HTTP method and URL:
POST https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/batchPredictionJobs
Request JSON body:
{ "displayName": "BATCH_JOB_NAME", "model": "MODEL_ID", "inputConfig": { "instancesFormat": "csv", "gcsSource": { "uris": [ URI1,... ] }, }, "outputConfig": { "predictionsFormat": "csv", "gcsDestination": { "outputUriPrefix": "OUTPUT_URI_PREFIX" } }, "dedicatedResources": { "machineSpec": { "machineType": "MACHINE_TYPE", "acceleratorCount": "0" }, "startingReplicaCount": STARTING_REPLICA_COUNT, "maxReplicaCount": MAX_REPLICA_COUNT }, "generateExplanation": GENERATE_EXPLANATION }
To send your request, choose one of these options:
curl Note: The following command assumes that you have logged in to thegcloud
CLI with your user account by running gcloud init
or gcloud auth login
, or by using Cloud Shell, which automatically logs you into the gcloud
CLI . You can check the currently active account by running gcloud auth list
.
Save the request body in a file named request.json
, and execute the following command:
curl -X POST \PowerShell Note: The following command assumes that you have logged in to the
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/batchPredictionJobs"
gcloud
CLI with your user account by running gcloud init
or gcloud auth login
. You can check the currently active account by running gcloud auth list
.
Save the request body in a file named request.json
, and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/batchPredictionJobs" | Select-Object -Expand Content
You should receive a JSON response similar to the following:
{ "name": "projects/PROJECT__ID/locations/LOCATION_ID/batchPredictionJobs/67890", "displayName": "batch_job_1 202005291958", "model": "projects/12345/locations/us-central1/models/5678", "state": "JOB_STATE_PENDING", "inputConfig": { "instancesFormat": "csv", "gcsSource": { "uris": [ "gs://bp_bucket/reg_mode_test" ] } }, "outputConfig": { "predictionsFormat": "csv", "gcsDestination": { "outputUriPrefix": "OUTPUT_URI_PREFIX" } }, "dedicatedResources": { "machineSpec": { "machineType": "n1-standard-32", "acceleratorCount": "0" }, "startingReplicaCount": 2, "maxReplicaCount": 6 }, "manualBatchTuningParameters": { "batchSize": 4 } "outputInfo": { "gcsOutputDataset": "OUTPUT_URI_PREFIX/prediction-batch_job_1 202005291958-2020-09-30T02:58:44.341643Z" } "state": "JOB_STATE_PENDING", "createTime": "2020-09-30T02:58:44.341643Z", "updateTime": "2020-09-30T02:58:44.341643Z", }Retrieve batch inference results
Vertex AI sends the output of batch inferences to the destination that you specified, which can be either BigQuery or Cloud Storage.
BigQueryOutput dataset
If you are using BigQuery, the output of batch inference is stored in an output dataset. If you had provided a dataset to Vertex AI, the name of the dataset (BQ_DATASET_NAME) is the name you had provided earlier. If you didn't provide an output dataset, Vertex AI created one for you. You can find its name (BQ_DATASET_NAME) with the following steps:
prediction_MODEL_NAME_TIMESTAMP
The output dataset contains one or more of the following three output tables:
This table contains a row for every row in your input data where an inference was requested (that is, where TARGET_COLUMN_NAME = null).
This table contains a row for each non-critical error encountered during batch inference. Each non-critical error corresponds with a row in the input data that Vertex AI could not return a forecast for.
Predictions table
The name of the table (BQ_PREDICTIONS_TABLE_NAME) is formed by appending `predictions_` with the timestamp of when the batch inference job started: predictions_TIMESTAMP
To retrieve inferences, go to the BigQuery page.
The format of the query depends on your model type:Classification:
SELECT predicted_TARGET_COLUMN_NAME.classes AS classes, predicted_TARGET_COLUMN_NAME.scores AS scores FROM BQ_DATASET_NAME.BQ_PREDICTIONS_TABLE_NAME
classes
is the list of potential classes, and scores
are the corresponding confidence scores.
Regression:
SELECT predicted_TARGET_COLUMN_NAME.value, predicted_TARGET_COLUMN_NAME.lower_bound, predicted_TARGET_COLUMN_NAME.upper_bound FROM BQ_DATASET_NAME.BQ_PREDICTIONS_TABLE_NAME
If you enabled feature attributions, you can find them in the predictions table as well. To access attributions for a feature BQ_FEATURE_NAME, run the following query:
SELECT explanation.attributions[OFFSET(0)].featureAttributions.BQ_FEATURE_NAME FROM BQ_DATASET_NAME.BQ_PREDICTIONS_TABLE_NAME
Errors table
The name of the table (BQ_ERRORS_TABLE_NAME) is formed by appendingerrors_
with the timestamp of when the batch inference job started: errors_TIMESTAMP
To retrieve the errors validation table:
In the console, go to the BigQuery page.
SELECT * FROM BQ_DATASET_NAME.BQ_ERRORS_TABLE_NAME
If you specified Cloud Storage as your output destination, the results of your batch inference request are returned as CSV objects in a new folder in the bucket you specified. The name of the folder is the name of your model, prepended with "prediction-" and appended with the timestamp of when the batch inference job started. You can find the Cloud Storage folder name in the Batch predictions tab for your model.
The Cloud Storage folder contains two kinds of objects:The inference objects are named `predictions_1.csv`, `predictions_2.csv`, and so on. They contain a header row with the column names, and a row for every inference returned. In the inference objects, Vertex AI returns your inference data and creates one or more new columns for the inference results based on your model type:
TARGET_COLUMN_NAME_VALUE_score
is added to the results. This column contains the score, or confidence estimate, for that value.predicted_TARGET_COLUMN_NAME
. The prediction interval is not returned for CSV output.The error objects are named `errors_1.csv`, `errors_2.csv`, and so on. They contain a header row, and a row for every row in your input data that Vertex AI could not return an inference (for example, if a non-nullable feature was null) for.
Note: If the results are large, it is split into multiple objects.
Feature attributions are not available for batch inference results returned in Cloud Storage.
Interpret inference results ClassificationClassification models return a confidence score.
The confidence score communicates how strongly your model associates each class or label with a test item. The higher the number, the higher the model's confidence that the label should be applied to that item. You decide how high the confidence score must be for you to accept the model's results.
RegressionRegression models return an inference value. For BigQuery destinations, they also return a inference interval. The inference interval provides a range of values that the model has 95% confidence contain the actual result.
Interpret explanation resultsIf your batch inference results are stored in BigQuery and you chose to enable feature attributions, you can find the feature attribution values in the inferences table.
To calculate local feature importance, first the baseline inference score is calculated. Baseline values are computed from the training data, using the median value for numeric features and the mode for categorical features. The inference generated from the baseline values is the baseline inference score. Baseline values are calculated once for a model and do not change.
For a specific inference, the local feature importance for each feature tells you how much that feature added to or subtracted from the result as compared with the baseline inference score. The sum of all of the feature importance values equals the difference between the baseline inference score and the inference result.
For classification models, the score is always between 0.0 and 1.0, inclusive. Therefore, local feature importance values for classification models are always between -1.0 and 1.0 (inclusive).
For examples of feature attribution queries and to learn more, see
Feature Attributions for Classification and Regression.
What's nextExcept as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-08-07 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-07 UTC."],[],[]]
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4