This document describes how to configure Google Kubernetes Engine (GKE) to send metrics to Cloud Monitoring. Metrics in Cloud Monitoring can populate custom dashboards, generate alerts, create service-level objectives, or be fetched by third-party monitoring services using the Cloud Monitoring API.
GKE provides several sources of metrics:
Packages of observability metrics:
Kube state metrics: a curated set of metrics exported from the kube state service, used to monitor the state of Kubernetes objects like Pods, Deployments, and more. For the set of included metrics, see Use kube state metrics.
The kube state package is a managed solution. If you need greater flexibility—for example, if you need to collect additional metrics, or need to manage scrape intervals or to scrape other resources—you can disable the package, if it is enabled, and deploy your own instance of the open source kube state metrics service. For more information, see the Google Cloud Managed Service for Prometheus exporter documentation for Kube state metrics.
cAdvisor/Kubelet: a curated set of cAdvisor and Kubelet metrics. For the set of included metrics, see Use cAdvisor/Kubelet metrics.
The cAdvisor/Kubelet package is a managed solution. If you need greater flexibility—for example, if you need to collect additional metrics or to manage scrape intervals or to scrape other resources—you can disable the package, if it is enabled, and deploy your own instance of the open source cAdvisor/Kubelet metrics services.
NVIDIA Data Center GPU Manager (DCGM) metrics: metrics from DCGM that provide a comprehensive view of GPU health, performance, and utilization.
You can also configure automatic application monitoring for certain workloads.
System metricsWhen a cluster is created, GKE by default collects certain metrics emitted by system components.
You have a choice whether or not to send metrics from your GKE cluster to Cloud Monitoring. If you choose to send metrics to Cloud Monitoring, you must send system metrics.
All GKE system metrics are ingested into Cloud Monitoring with the prefix kubernetes.io
.
Cloud Monitoring does not charge for the ingestion of GKE system metrics. For more information, see Cloud Monitoring pricing.
Configuring collection of system metricsTo enable system metric collection, pass the SYSTEM
value to the --monitoring
flag of the gcloud container clusters create
or gcloud container clusters update
commands.
To disable system metric collection, use the NONE
value for the --monitoring
flag. If system metric collection is disabled, basic information like CPU usage, memory usage, and disk usage are not available for a cluster when viewing observability metrics.
For GKE Autopilot clusters, you cannot disable the collection of system metrics.
Warning: If you disable Cloud Logging or Cloud Monitoring or apply exclusion filters, GKE customer support is offered on a best-effort basis and might require additional effort from your engineering team.See Observability for GKE for more details about Cloud Monitoring integration with GKE.
To configure the collection of system metrics by using Terraform, see the monitoring_config
block in the Terraform registry for google_container_cluster
. For general information about using Google Cloud with Terraform, see Terraform with Google Cloud.
System metrics include metrics from essential system components important for Kubernetes. For a list of these metrics, see GKE system metrics.
If you enable Cloud Monitoring for your cluster, then you can't disable system monitoring (--monitoring=SYSTEM
).
In the following tables, a checkmark () indicates which metrics are enabled by default when you create and register a new cluster in a project with GKE Enterprise enabled:
Metric name Autopilot Standard System API server Scheduler Controller Manager Persistent volume (Storage) Pods Deployment StatefulState DaemonSet HorizonalPodAutoscaler cAdvisor Kubelet NVIDIA Data Center GPU Manager (DCGM) metricsAll registered clusters in a project that has GKE Enterprise enabled can use the packages for control plane metrics, kube state metrics, and cAdvisor/kubelet metrics without any additional charges. Otherwise these metrics incur Cloud Monitoring charges.
Troubleshooting system metricsIf system metrics are not available in Cloud Monitoring as expected, see Troubleshoot system metrics.
Package: Control plane metricsYou can configure a GKE cluster to send certain metrics emitted by the Kubernetes API server, Scheduler, and Controller Manager to Cloud Monitoring.
For more information, see Collect and view control plane metrics.
Package: Kube state metricsYou can configure a GKE cluster to send a curated set of kube state metrics in Prometheus format to Cloud Monitoring. This package of kube state metrics includes metrics for Pods, Deployments, StatefulSets, DaemonSets, HorizontalPodAutoscaler resources, Persistent Volumes, Persistent Volume Claims, and JobSets.
For more information, see Collect and view Kube state metrics.
Package: cAdvisor/Kubelet metricsYou can configure a GKE cluster to send a curated set of cAdvisor/Kubelet metrics in Prometheus format to Cloud Monitoring. The curated set of metrics is a subset of the large set of cAdvisor/Kubelet metrics built into every Kubernetes deployment by default. The curated cAdvisor/Kubelet is designed to provide the most useful metrics, reducing ingestion volume and associated costs.
For more information, see Collect and view cAdvisor/Kubelet metrics.
Package: NVIDIA Data Center GPU Manager (DCGM) metricsYou can monitor GPU utilization, performance, and health by configuring GKE to send NVIDIA Data Center GPU Manager (DCGM) metrics to Cloud Monitoring.
For more information, see Collect and view NVIDIA Data Center GPU Manager (DCGM) metrics.
Disable metric packagesYou can disable the use of metric packages in the cluster. You might want to disable certain packages to reduce costs or if you are using an alternate mechanism for collecting the metrics, like Google Cloud Managed Service for Prometheus and an exporter.
ConsoleTo disable the collection of metrics from the Details tab for the cluster, do the following:
In the Google Cloud console, go to the Kubernetes clusters page:
If you use the search bar to find this page, then select the result whose subheading is Kubernetes Engine.
Click your cluster's name.
In the Features row labelled Cloud Monitoring, click the Edit icon.
In the Components drop-down menu, clear the metric components that you want to disable.
Click OK.
Click Save Changes.
Open a terminal window with Google Cloud SDK and the Google Cloud CLI installed. One way to do this is to use Cloud Shell.
In the Google Cloud console, activate Cloud Shell.
At the bottom of the Google Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.
Call the gcloud container clusters update
command and pass an updated set of values to the --monitoring
flag. The set of values supplied to the --monitoring
flag overrides any previous setting.
For example, to turn off the collection of all metrics except system metrics, run the following command:
gcloud container clusters update CLUSTER_NAME \
--location=COMPUTE_LOCATION \
--enable-managed-prometheus
\
--monitoring=SYSTEM
This command disables the collection of any previously configured metric packages.
To configure the collection of metrics by using Terraform, see the monitoring_config
block in the Terraform registry for google_container_cluster
. For general information about using Google Cloud with Terraform, see Terraform with Google Cloud.
You can use Cloud Monitoring to identify the control plane or kube state metrics that are writing the largest numbers of samples. These metrics are contributing the most to your costs. After you identify the most expensive metrics, you can modify your scrape configs to filter these metrics appropriately.
The Cloud Monitoring Metrics Management page provides information that can help you control the amount you spend on billable metrics without affecting observability. The Metrics Management page reports the following information:
You can also use the Metrics Management page to exclude unneeded metrics, eliminating the cost of ingesting them.
To view the Metrics Management page, do the following:
In the Google Cloud console, go to the query_stats Metrics management page:
If you use the search bar to find this page, then select the result whose subheading is Monitoring.
For more information about the Metrics Management page, see View and manage metric usage.
To identify which control plane or kube state metrics have the largest number of samples being ingested, do the following:
In the Google Cloud console, go to the query_stats Metrics management page:
If you use the search bar to find this page, then select the result whose subheading is Monitoring.
On the Billable samples ingested scorecard, click View charts.
Locate the Namespace Volume Ingestion chart, and then click more_vert More chart options.
In the Metric field, verify that the following resource and and metric are selected:Metric Ingestion Attribution
and Samples written by attribution id
.
In the Filters page, do the following:
In the Label field, verify that the value is attribution_dimension
.
In the Comparison field, verify that the value is = (equals)
.
In the Value field, select cluster
.
Clear the Group by setting.
Optionally, filter for only certain metrics. For example, control plane API server metrics all include "apiserver" as part of the metric name, and kube state Pod metrics all include "kube_pod" as part of the metric name, so you can filter for metrics containing those strings:
Click Add Filter.
In the Label field, select metric_type
.
In the Comparison field, select =~ (equals regex)
.
In the Value field, enter .*apiserver.*
or .*kube_pod.*
.
Optionally, group the number of samples ingested by GKE region or project:
Click Group by.
Ensure metric_type is selected.
To group by GKE region, select location.
To group by project, select project_id.
Click OK.
Optionally, group the number of samples ingested by GKE cluster name:
Click Group by.
To group by GKE cluster name, ensure both attribution_dimension and attribution_id are selected.
Click OK.
To see the ingestion volume for each of the metrics, in the toggle labeled Chart Table Both, select Both. The table shows the ingested volume for each metric in the Value column.
Click the Value column header twice to sort the metrics by descending ingestion volume.
These steps show the metrics with the highest rate of samples ingested into Cloud Monitoring. Because the metrics in the observability packages are charged by the number of samples ingested, pay attention to metrics with the greatest rate of samples being ingested.
Other metricsIn addition to the system metrics and metric packages described in this document, Istio metrics are also available for GKE clusters. For pricing information, see Cloud Monitoring pricing.
Available metricsThe following table indicates supported values for the --monitoring
flag for the create and update commands.
--monitoring
value Metrics Collected None NONE
No metrics sent to Cloud Monitoring; no metric collection agent installed in the cluster. This value isn't supported for Autopilot clusters. System SYSTEM
Metrics from essential system components required for Kubernetes. For a complete list of the metrics, see Kubernetes metrics. API server API_SERVER
Metrics from kube-apiserver
. For a complete list of the metrics, see API server metrics. Scheduler SCHEDULER
Metrics from kube-scheduler
. For a complete list of the metrics, see Scheduler metrics. Controller Manager CONTROLLER_MANAGER
Metrics from kube-controller-manager
. For a complete list of the metrics, see Controller Manager metrics. Persistent volume (Storage) STORAGE
Storage metrics from kube-state-metrics
. Includes metrics for Persistent Volume and Persistent Volume Claims. For a complete list of the metrics, see Storage metrics. Pod POD
Pod metrics from kube-state-metrics
. For a complete list of the metrics, see Pod metrics. Deployment DEPLOYMENT
Deployment metrics from kube-state-metrics
. For a complete list of the metrics, see Deployment metrics. StatefulSet STATEFULSET
StatefulSet metrics from kube-state-metrics
. For a complete list of the metrics, see StatefulSet metrics. DaemonSet DAEMONSET
DaemonSet metrics from kube-state-metrics
. For a complete list of the metrics, see DaemonSet metrics. HorizonalPodAutoscaler HPA
HPA metrics from kube-state-metrics
. See a complete list of HorizonalPodAutoscaler metrics. cAdvisor CADVISOR
cAdvisor metrics from the cAdvisor/Kubelet metrics package. For a complete list of the metrics, see cAdvisor metrics. Kubelet KUBELET
Kubelet metrics from the cAdvisor/Kubelet For a complete list of the metrics, see Kubelet metrics. NVIDIA Data Center GPU Manager (DCGM) metrics DCGM
Metrics from NVIDIA Data Center GPU Manager (DCGM).
You can also collect Prometheus-style metrics exposed by any GKE workload by using Google Cloud Managed Service for Prometheus, which lets you monitor and alert on your workloads, using Prometheus, without having to manually manage and operate Prometheus at scale.
What's nextRetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4