A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from http://cloud.google.com/kubernetes-engine/docs/troubleshooting/crashloopbackoff-events below:

Troubleshoot CrashLoopBackOff events | Google Kubernetes Engine (GKE)

This page helps you resolve issues with Pods experiencing CrashLoopBackOff events in Google Kubernetes Engine (GKE).

This page is for Application developers who want to identify app-level issues, such as configuration errors or code-related bugs, that cause their containers to crash. It is also for Platform admins and operators who need to identify platform-level root causes for container restarts, such as resource exhaustion, node disruptions, or misconfigured liveness probes. For more information about common roles and example tasks that we reference in Google Cloud content, see Common GKE user roles and tasks.

Understand a CrashLoopBackOff event

When your Pod is stuck in a CrashLoopBackOff state, a container within it is repeatedly starting and crashing or exiting. This CrashLoop triggers Kubernetes to attempt restarting the container by adhering to its restartPolicy. With each failed restart, the BackOff delay before the next attempt increases exponentially (for example, 10s, 20s, 40s), up to a maximum of five minutes.

Although this event indicates a problem within your container, it's also a valuable diagnostic signal. A CrashLoopBackOff event confirms that many foundational steps of Pod creation, such as assignment to a node and pulling the container image, have already completed. This knowledge lets you focus your investigation on the container's app or configuration, rather than the cluster infrastructure.

The CrashLoopBackOff state occurs because of how Kubernetes, specifically the kubelet, handles container termination based on the Pod's restart policy. The cycle typically follows this pattern:

  1. The container starts.
  2. The container exits.
  3. The kubelet observes the stopped container and restarts it according to the Pod's restartPolicy.
  4. This cycle repeats, with the container restarted after an increasing exponential back-off delay.

The Pod's restartPolicy is the key to this behavior. The default policy, Always, is the most common cause of this loop because it restarts a container if it exits for any reason, even after a successful exit. The OnFailure policy is less likely to cause a loop because it only restarts on non-zero exit codes, and the Never policy avoids a restart entirely.

Identify symptoms of a CrashLoopBackOff event

A Pod with the CrashLoopBackOff status is the primary indication of a CrashLoopBackOff event.

However, you might experience some less obvious symptoms of a CrashLoopBackOff event:

If a system workload (for example, a logging or metrics agent) has the CrashLoopBackOff status, you might also notice the following symptoms:

If you observe any of these less obvious symptoms, your next step should be to confirm if a CrashLoopBackOff event occurred.

Confirm a CrashLoopBackOff event

To confirm and investigate a CrashLoopBackOff event, gather evidence from Kubernetes events and the container's app logs. These two sources provide different, but complementary views of the problem:

To view this information, select one of the following options:

Console

To view Kubernetes events and app logs, do the following:

  1. In the Google Cloud console, go to the Workloads page.

    Go to Workloads

  2. Select the workload that you want to investigate. The Overview or Details tab displays more information about the status of the workload.

  3. From the Managed Pods section, click the name of the problematic Pod.

  4. On the Pod details page, investigate the following:

kubectl

To view Kubernetes events and app logs, do the following:

  1. View the status of all Pods running in your cluster:

    kubectl get pods
    

    The output is similar to the following:

    NAME       READY  STATUS             RESTARTS  AGE
    POD_NAME   0/1    CrashLoopBackOff   23        8d
    

    In the output, review the following columns:

  2. After you identify a failing Pod, describe it to see cluster-level events that are related to the Pod's state:

    kubectl describe pod POD_NAME -n NAMESPACE_NAME
    

    Replace the following:

    The output is similar to the following:

    Containers:
    container-name:
    ...
      State:          Waiting
        Reason:       CrashLoopBackOff
      Last State:     Terminated
        Reason:       StartError
        Message:      failed to create containerd task: failed to create shim task: context deadline exceeded: unknown
        Exit Code:    128
        Started:      Thu, 01 Jan 1970 00:00:00 +0000
        Finished:     Fri, 27 Jun 2025 16:20:03 +0000
      Ready:          False
      Restart Count:  3459
    ...
    Conditions:
    Type                        Status
    PodReadyToStartContainers   True
    Initialized                 True
    Ready                       False
    ContainersReady             False
    PodScheduled                True
    ...
    Events:
    Type     Reason   Age                     From     Message
    ----     ------   ----                    ----     -------
    Warning  Failed   12m (x216 over 25h)     kubelet  Error: context deadline exceeded
    Warning  Failed   8m34s (x216 over 25h)   kubelet  Error: context deadline exceeded
    Warning  BackOff  4m24s (x3134 over 25h)  kubelet  Back-off restarting failed container container-name in pod failing-pod(11111111-2222-3333-4444-555555555555)
    

    In the output, review the following fields for signs of a CrashLoopBackOff event:

  3. To learn more about why the Pod failed, view its app logs:

    kubectl logs POD_NAME --previous
    

    The --previous flag retrieves logs from the prior, terminated container, which is where you can find the specific stack trace or error message that reveals the cause of the crash. The current container might be too new to have recorded any logs.

    In the output, look for app-specific errors that would cause the process to exit. If you use a custom-made app, the developers who wrote it are best equipped to interpret these error messages. If you use a prebuilt app, these apps often provide their own debugging instructions.

Use the Crashlooping Pods interactive playbook

After you confirm a CrashLoopBackOff event, begin troubleshooting with the interactive playbook:

  1. In the Google Cloud console, go to the GKE Interactive Playbook - Crashlooping Pods page.

    Go to Crashlooping Pods

  2. In the Cluster list, select the cluster that you want to troubleshoot. If you can't find your cluster, enter the name of the cluster in the Filter field.

  3. In the Namespace list, select the namespace that you want to troubleshoot. If you can't find your namespace, enter the namespace in the Filter field.

  4. Work through each section to help you answer the following questions:

    1. Identify App Errors: which containers are restarting?
    2. Investigate Out Of Memory Issues: is there a misconfiguration or an error related to the app?
    3. Investigate Node Disruptions: are disruptions on the node resource causing container restarts?
    4. Investigate Liveness Probe Failures: are liveness probes stopping your containers?
    5. Correlate Change Events: what happened around the time the containers started crashing?
  5. Optional: To get notifications about future CrashLoopBackOff events, in the Future Mitigation Tips section, select Create an Alert.

If your problem persists after using the playbook, read the rest of the guide for more information about resolving CrashLoopBackOff events.

Resolve a CrashLoopBackOff event

The following sections help you resolve the most common causes of CrashLoopBackOff events:

Resolve resource exhaustion

A CrashLoopBackOff event is often caused by an Out of Memory (OOM) issue. You can confirm if this is the cause if the kubectl describe output shows the following:

Last State: Terminated
  Reason: OOMKilled

For information about how to diagnose and resolve OOM events, see Troubleshoot OOM events.

Resolve liveness probe failures

A liveness probe is a periodic health check performed by the kubelet. If the probe fails a specified number of times (the default number is three), the kubelet restarts the container, potentially causing a CrashLoopBackOff event if the probe failures continue.

Confirm if a liveness probe is the cause

To confirm if liveness probe failures are triggering the CrashLoopBackOff event, query your kubelet logs. These logs often contain explicit messages indicating probe failures and subsequent restarts.

  1. In the Google Cloud console, go to the Logs Explorer page.

    Go to Logs Explorer

  2. In the query pane, filter for any liveness-probe-related restarts by entering the following query:

    resource.type="k8s_node"
    log_id("kubelet")
    jsonPayload.MESSAGE:"failed liveness probe, will be restarted"
    resource.labels.cluster_name="CLUSTER_NAME"
    

    Replace CLUSTER_NAME with the name of your cluster.

  3. Review the output. If a liveness probe failure is the cause of your CrashLoopBackOff events, the query returns log messages similar to the following:

    Container probe failed liveness probe, will be restarted
    

After you confirm that liveness probes are the cause of the CrashLoopBackOff event, proceed to troubleshoot common causes:

Review liveness probe configuration

Misconfigured probes are a frequent cause of CrashLoopBackOff events. Check the following settings in the manifest of your probe:

For more information, see Configure Liveness, Readiness and Startup Probes in the Kubernetes documentation.

Inspect CPU and disk I/O utilization

Resource contention results in probe timeouts, which is a major cause of liveness probe failures. To see if resource usage is the cause of the liveness probe failure, try the following solutions:

Address large deployments

In scenarios where a large number of Pods are deployed simultaneously (for example, by a CI/CD tool like ArgoCD), a sudden surge of new Pods can overwhelm cluster resources, leading to control plane resource exhaustion. This lack of resources delays app startup and can cause liveness probes to fail repeatedly before the apps are ready.

To resolve this issue, try the following solutions:

Address transient errors

The app might experience temporary errors or slowdowns during startup or initialization that cause the probe to fail initially. If the app eventually recovers, consider increasing the values defined in the initialDelaySeconds or failureThreshold fields in the manifest of your liveness probe.

Address probe resource consumption

In rare cases, the liveness probe's execution itself might consume significant resources, which could trigger resource constraints that potentially lead to the container being terminated due to an OOM kill. Ensure your probe commands are lightweight. A lightweight probe is more likely to execute quickly and reliably, giving it higher fidelity in accurately reporting your app's true health.

Resolve app misconfigurations

App misconfigurations cause many CrashLoopBackOff events. To understand why your app is stopping, the first step is to examine its exit code. This code determines your troubleshooting path:

Find the exit code

To find the exit code of your app, do the following:

  1. Describe the Pod:

    kubectl describe pod POD_NAME -n NAMESPACE_NAME
    

    Replace the following:

  2. In the output, review the Exit Code field located under the Last State section for the relevant container. If the exit code is 0, see Troubleshoot successful exits (exit code 0). If the exit code is a number other than 0, see Troubleshoot app crashes (non-zero exit code).

Troubleshoot successful exits (exit code 0)

An exit code of 0 typically means the container's process finished successfully. Although this is the outcome that you want for a task-based Job, it can signal a problem for a long-running controller like a Deployment, StatefulSet, or ReplicaSet.

These controllers work to ensure a Pod is always running, so they treat any exit as a failure to be corrected. The kubelet enforces this behavior by adhering to the Pod's restartPolicy (which defaults to Always), restarting the container even after a successful exit. This action creates a loop, which ultimately triggers the CrashLoopBackOff status.

The most common reasons for unexpected successful exits are the following:

Troubleshoot app crashes (non-zero exit code)

When a container exits with a non-zero code, Kubernetes restarts it. If the underlying issue that caused the error is persistent, the app crashes again and the cycle repeats, culminating in a CrashLoopBackOff state.

The non-zero exit code is a clear signal that an error occurred within the app itself, which directs your debugging efforts toward its internal workings and environment. The following issues often cause this termination:

What's next

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4