A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from http://cloud.google.com/batch/docs/automate-task-retries below:

Automate task retries | Batch

Skip to main content Automate task retries

Stay organized with collections Save and categorize content based on your preferences.

This page describes how to automatically retry tasks after all or some failures.

A Batch job fails when at least one of its tasks fails, which can happen for various reasons. By default, each task in a job only runs once; if a task fails, it is not retried. However, some issues that cause a task to fail can be easily resolved just by retrying the task. In these cases, configuring the job to automatically retry tasks can substantially help reduce troubleshooting friction and the overall run time of your jobs.

Automatic retries are well-suited to loosely coupled (independent) tasks and can help with a variety of issues. For example, automatic task retries can resolve time-sensitive issues like following:

You can configure automatic task retries for each task when you create a job. Specifically, for each task, you can use one of the following configuration options:

Before you begin
  1. If you haven't used Batch before, review Get started with Batch and enable Batch by completing the prerequisites for projects and users.
  2. To get the permissions that you need to create a job, ask your administrator to grant you the following IAM roles:

    For more information about granting roles, see Manage access to projects, folders, and organizations.

    You might also be able to get the required permissions through custom roles or other predefined roles.

Retry tasks for all failures

You can define the maximum number of automatic retries (maxRetryCount field) for a job's failed tasks using the gcloud CLI or Batch API.

gcloud
  1. Create a JSON file that specifies the job's configuration details and the maxRetryCount field.

    For example, to create a basic script job that specifies the maximum retries for failed tasks, create a JSON file with the following contents:

    {
      "taskGroups": [
        {
          "taskSpec": {
            "runnables": [
              {
                "script": {
                  "text": "echo Hello world from task ${BATCH_TASK_INDEX}"
                }
              }
            ],
            
            "maxRetryCount": MAX_RETRY_COUNT
            
          },
          "taskCount": 3
        }
      ],
      "logsPolicy": {
        "destination": "CLOUD_LOGGING"
      }
    }
    

    Replace MAX_RETRY_COUNT with the maximum number of retries for each task. For a job to be able to retry failed tasks, this value must be set to an integer between 1 and 10. If the maxRetryCount field is not specified, the default value is 0, which means to not retry any tasks.

  2. To create and run the job, use the gcloud batch jobs submit command:

    gcloud batch jobs submit JOB_NAME \
      --location LOCATION \
      --config JSON_CONFIGURATION_FILE
    

    Replace the following:

API

Make a POST request to the jobs.create method that specifies the maxRetryCount field.

For example, to create a basic script job that specifies the maximum retries for failed tasks, make the following request:

POST https://batch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/jobs?job_id=JOB_NAME

{
  "taskGroups": [
    {
      "taskSpec": {
        "runnables": [
          {
            "script": {
              "text": "echo Hello world from task ${BATCH_TASK_INDEX}"
            }
          }
        ],
        
        "maxRetryCount": MAX_RETRY_COUNT
        
      },
      "taskCount": 3
    }
  ],
  "logsPolicy": {
    "destination": "CLOUD_LOGGING"
  }
}

Replace the following:

Retry tasks for some failures

You can define how you want a job to handle different task failures by using lifecycle policies (lifecyclePolicies[] field).

A lifecycle policy consists of an action (action field), action condition (actionCondition field), and exit code (exitCodes[] field). The specified action is taken whenever the action condition—a specific exit code—occurs. You can specify one the following actions:

Notably, any tasks that fail with unspecified exit codes take the opposite action—some exit codes are retried and some are failed. Consequently, for the lifecycle policy to work as expected, you also need to define the maximum number of automatic retries (maxRetryCount field) to allow the job to automatically retry failed tasks at least once.

Each exit code represents a specific failure that is defined either by your application or Batch. The exit codes from 50001 to 59999 are reserved and defined by Batch. For more information about the reserved exit codes, see Troubleshooting.

You can specify for a job to retry or fail tasks after specific failures using gcloud CLI or Batch API.

gcloud
  1. Create a JSON file that specifies the job's configuration details, the maxRetryCount field, and the lifecyclePolicies[] subfields.

    To create a basic script job that retries failed tasks only for some exit codes, create a JSON file with the following contents:

    {
      "taskGroups": [
        {
          "taskSpec": {
            "runnables": [
              {
                "script": {
                  "text": "echo Hello world from task ${BATCH_TASK_INDEX}"
                }
              }
            ],
            
            "maxRetryCount": MAX_RETRY_COUNT,
            "lifecyclePolicies": [
              {
                "action": "ACTION",
                "actionCondition": {
                   "exitCodes": [EXIT_CODES]
                }
              }
            ]
          }
        }
      ],
      "logsPolicy": {
        "destination": "CLOUD_LOGGING"
      }
    }
    

    Replace the following:

    For example, the following job only retries tasks that fail due to the preemption of Spot VMs.

    {
      "taskGroups": [
        {
          "taskSpec": {
            "runnables": [
              {
                "script": {
                  "text": "sleep 30"
                }
              }
            ],
            "maxRetryCount": 3,
            "lifecyclePolicies": [
              {
                 "action": "RETRY_TASK",
                 "actionCondition": {
                   "exitCodes": [50001]
                }
              }
            ]
          }
        }
      ],
      "allocationPolicy": {
        "instances": [
          {
            "policy": {
              "machineType": "e2-standard-4",
              "provisioningModel": "SPOT"
            }
          }
        ]
      }
    }
    
  2. To create and run the job, use the gcloud batch jobs submit command:

    gcloud batch jobs submit JOB_NAME \
      --location LOCATION \
      --config JSON_CONFIGURATION_FILE
    

    Replace the following:

API

Make a POST request to the jobs.create method that specifies the maxRetryCount field and lifecyclePolicies[] subfields.

To create a basic script job that retries failed tasks only for some exit codes, make the following request:

POST https://batch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/jobs?job_id=JOB_NAME

{
  "taskGroups": [
    {
      "taskSpec": {
        "runnables": [
          {
            "script": {
              "text": "echo Hello world from task ${BATCH_TASK_INDEX}"
            }
          }
        ],
        
        "maxRetryCount": MAX_RETRY_COUNT,
        "lifecyclePolicies": [
          {
            "action": "ACTION",
            "actionCondition": {
                "exitCodes": [EXIT_CODES]
            }
          }
        ]
      }
    }
  ],
  "logsPolicy": {
    "destination": "CLOUD_LOGGING"
  }
}

Replace the following:

For example, the following job only retries tasks that fail due to the preemption of Spot VMs.

POST https://batch.googleapis.com/v1/projects/example-project/locations/us-central1/jobs?job_id=example-job

{
  "taskGroups": [
    {
      "taskSpec": {
        "runnables": [
          {
            "script": {
              "text": "sleep 30"
            }
          }
        ],
        "maxRetryCount": 3,
        "lifecyclePolicies": [
          {
             "action": "RETRY_TASK",
             "actionCondition": {
               "exitCodes": [50001]
            }
          }
        ]
      }
    }
  ],
  "allocationPolicy": {
    "instances": [
      {
        "policy": {
          "machineType": "e2-standard-4",
          "provisioningModel": "SPOT"
        }
      }
    ]
  }
}
Modify task behavior based on the number of retries

Optionally, after you've enabled automatic retries for a task as described in the previous sections on this page, you can update your runnables to use the BATCH_TASK_RETRY_ATTEMPT predefined environment variable. The BATCH_TASK_RETRY_ATTEMPT variable describes the number of times that this task has already been attempted. Use the BATCH_TASK_RETRY_ATTEMPT variable in your runnables if you want a task to behave differently based on the number of retries. For example, when a task is being retried, you might want to confirm which commands were already successfully executed in the previous attempt. For more information, see Predefined environment variables.

What's next

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-08-07 UTC.

[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-07 UTC."],[[["This page outlines how to configure automatic retries for tasks within Batch jobs to mitigate issues like Spot VM preemption, VM maintenance, and network errors."],["You can set a maximum number of retries (between 1 and 10) for all failed tasks in a job, using the `maxRetryCount` field, which defaults to zero retries if not specified."],["It's possible to define specific actions, like retrying or failing a task, based on particular exit codes using lifecycle policies and the `lifecyclePolicies[]` field in job configurations."],["The `BATCH_TASK_RETRY_ATTEMPT` environment variable can be utilized within your task's runnables to adjust the task's behavior based on the number of previous retry attempts."]]],[]]


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4