A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://cloud.google.com/kubernetes-engine/docs/troubleshooting/system-metrics below:

Troubleshooting system metrics | Google Kubernetes Engine (GKE)

Troubleshooting system metrics

Stay organized with collections Save and categorize content based on your preferences.

This page shows you how to resolve system metrics-related issues on your Google Kubernetes Engine (GKE) clusters.

Metrics from your cluster not appearing in Cloud Monitoring

Ensure that you've enabled the Monitoring API and the Logging API on your project. You should also confirm that you're able to view your project in the Cloud Monitoring overview in the Google Cloud console.

If the issue persists, check the following potential causes:

Identify and fix permissions issues for writing metrics

GKE uses IAM service accounts that are attached to your nodes to run system tasks like logging and monitoring. At a minimum, these node service accounts must have the Kubernetes Engine Default Node Service Account (roles/container.defaultNodeServiceAccount) role on your project. By default, GKE uses the Compute Engine default service account, which is automatically created in your project, as the node service account.

If your organization enforces the iam.automaticIamGrantsForDefaultServiceAccounts organization policy constraint, the default Compute Engine service account in your project might not automatically get the required permissions for GKE.

Note: If your organization was created on or after May 3, 2024, this constraint is enforced by default.

To grant the roles/container.defaultNodeServiceAccount role to the Compute Engine default service account, complete the following steps:

console
  1. Go to the Welcome page:

    Go to Welcome

  2. In the Project number field, click content_copy Copy to clipboard.
  3. Go to the IAM page:

    Go to IAM

  4. Click person_add Grant access.
  5. In the New principals field, specify the following value:
    PROJECT_NUMBER-compute@developer.gserviceaccount.com
    Replace PROJECT_NUMBER with the project number that you copied.
  6. In the Select a role menu, select the Kubernetes Engine Default Node Service Account role.
  7. Click Save.
gcloud
  1. Find your Google Cloud project number:
    gcloud projects describe PROJECT_ID \
        --format="value(projectNumber)"

    Replace PROJECT_ID with your project ID.

    The output is similar to the following:

    12345678901
    
  2. Grant the roles/container.defaultNodeServiceAccount role to the Compute Engine default service account:
    gcloud projects add-iam-policy-binding PROJECT_ID \
        --member="serviceAccount:PROJECT_NUMBER-compute@developer.gserviceaccount.com" \
        --role="roles/container.defaultNodeServiceAccount"

    Replace PROJECT_NUMBER with the project number from the previous step.

Confirm that the metrics agent has sufficient memory

If you've tried the preceding troubleshooting steps and the metrics still aren't appearing, the metrics agent might have insufficient memory.

In most cases, the default allocation of resources to the GKE metrics agent is sufficient. However, if the DaemonSet crashes repeatedly, you can check the termination reason with the following instructions:

  1. Get the names of the GKE metrics agent Pods:

    kubectl get pods -n kube-system -l component=gke-metrics-agent
    
  2. Find the Pod with the status CrashLoopBackOff.

    The output is similar to the following:

    NAME                    READY STATUS           RESTARTS AGE
    gke-metrics-agent-5857x 0/1   CrashLoopBackOff 6        12m
    
  3. Describe the Pod that has the status CrashLoopBackOff:

    kubectl describe pod POD_NAME -n kube-system
    

    Replace POD_NAME with the name of the Pod from the previous step.

    If the termination reason of the Pod is OOMKilled, the agent needs additional memory.

    The output is similar to the following:

      containerStatuses:
      ...
      lastState:
        terminated:
          ...
          exitCode: 1
          finishedAt: "2021-11-22T23:36:32Z"
          reason: OOMKilled
          startedAt: "2021-11-22T23:35:54Z"
    
  4. Add a node label to the node with the failing metrics agent. You can use either a persistent or temporary node label. We recommend that you try adding an additional 20 MB. If the agent keeps crashing, you can run this command again, replacing the node label with one requesting a higher amount of additional memory.

    To update a node pool with a persistent label, run the following command:

    gcloud container node-pools update NODEPOOL_NAME \
        --cluster=CLUSTER_NAME \
        --node-labels=ADDITIONAL_MEMORY_NODE_LABEL \
        --location=COMPUTE_LOCATION
    

    Replace the following:

    Alternatively, you can add add a temporary node label that won't persist after an upgrade by using the following command:

    kubectl label node/NODE_NAME \
    ADDITIONAL_MEMORY_NODE_LABEL --overwrite
    

    Replace the following:

What's next

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-08-12 UTC.

[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-12 UTC."],[],[]]


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4