Stay organized with collections Save and categorize content based on your preferences.
This document shows you how to diagnose and mitigate CPU, memory, and storage performance issues on Compute Engine virtual machine (VM) and bare metal instances.
Before you beginTo view performance metrics for your compute instances, use the Cloud Monitoring observability metrics available in the Google Cloud console.
In the Google Cloud console, go to the VM Instances page.
You can view metrics for individual instances or for the five instances that are consuming the largest amount of a resource.
To view metrics for individual instances, do the following:
Click the name of the instance that you want to view performance metrics for. The instance Details page opens.
Click the Observability tab to open the Observability Overview page.
To view metrics for the five instances consuming the largest amount of a resource, click the Observability tab on the VM instances page.
Explore the instance's performance metrics. View the Overview, CPU, Memory, Network and Disk sections to see detailed metrics about each topic. The following are key metrics that indicate instance performance:
On the Overview page:
CPU Utilization. The percent of CPU used by the instance.
Memory Utilization. The percent of memory used by the instance, excluding disk caches. For instances that use a Linux OS, this also excludes kernel memory.
Network Traffic. The average rate of bytes sent and received in one minute intervals.
New Connections with VMs/External/Google. The estimated number of distinct TCP/UDP flows in one minute, grouped by peer type.
Disk Throughput. The average rate of bytes written to and read from disks.
Disk IOPS. The average rate of I/O read and write operations to disks.
On the Network Summary page:
Sent to VMs/External/Google. The rate of network traffic rate sent to Google services, instances, and external destinations, based on a sample of packets. The metric is scaled so that the sum matches the total sent network traffic.
Received from VMs/External/Google. The rate of network traffic received from Google services, instances, and external sources, based on a sample of packets. The metric is scaled so that the sum matches the total received network traffic.
Network Packet Totals. The total rate of sent and received packets in one minute intervals.
Packet Mean Size. The mean size of packets, in bytes, sent and received in one minute intervals.
Firewall Incoming Packets Denied. The rate of incoming network packets sent to the instance, but not received by the instance, because they were denied by firewall rules.
On the Disks Performance page:
I/O Size Avg. The average size of I/O read and write operations to disks. Small (4 to 16 KiB) random I/Os are usually limited by IOPS and sequential or large (256 KiB to 1 MiB) I/Os are limited by throughput.
Queue Length Avg. The number of queued and running disk I/O operations, also called queue depth, for the top 5 devices. To reach the performance limits of your disks, use a high I/O queue depth. Persistent Disk and Google Cloud Hyperdisk are networked storage and generally have higher latency compared to physical disks or Local SSD disks.
I/O Latency Avg. The average latency of I/O read and write operations aggregated across operations of all disks attached to the instance, measured by the Ops Agent. This value includes operating system and file system processing latency, and is dependent on queue length and I/O size.
Instance performance is affected by the hardware that the instance runs on, the workload running on the instance, and the instance's machine type. If the hardware cannot support the workload or network traffic of your instance, your instance's performance might be affected.
CPU and memory performance Hardware detailsCPU and memory performance is affected by the following hardware constraints:
To understand an instance's CPU and memory performance, view performance metrics for CPU Utilization and Memory Utilization. You can additionally use process metrics to view running processes, attribute anomalies in resource consumption to a specific process, or identify your instance's most expensive resource consumers.
Consistently high CPU or memory utilization indicate the need to scale up the size of a VM. If the VM consistently uses greater than 90% of its CPU or memory, change the VM's machine type to a machine type with more vCPUs or memory.
Unusually high or unusually low CPU utilization might indicate your VM is experiencing a CPU soft lockup. For more information, see Troubleshooting vCPU soft lockups.
Network performance Hardware detailsNetwork performance is affected by the following hardware constraints:
To understand an instance's network performance, view performance metrics for Network Packet Totals, Packet Mean Size, New Connections with VMs/External/Google, Sent to VMs/External/Google, Received From VMs/External/Google, and Firewall Incoming Packets Denied.
Review whether Network Packet Totals, Packet Mean Size, and New Connections with VMs/External/Google are typical for your workload. For example, a web server might experience many connections and small packets, while a database might experience few connections and large packets.
Consistently high outgoing network traffic might indicate the need to change the VM's machine type to a machine type that has a higher egress bandwidth limit.
If you notice high numbers of incoming packets denied by firewalls, visit the Network Intelligence Firewall Insights page in the Google Cloud console to learn more about the origins of denied packets.
Go to the Firewall Insights page
If you think your own traffic is being incorrectly denied by firewalls, you can create and run connectivity tests.
If your instance sends and receives a high amount of traffic from instances in different zones or regions, consider modifying your workload to keep more data within a zone or region to increase latency and decrease costs. For more information, see VM-VM data transfer pricing within Google Cloud. If your instance sends a large amount of traffic to other instances within the same zone, consider a compact placement policy to achieve low network latency.
Bare metal instancesUnlike VM instances, in a bare metal instance, the C6 and C1E sleep states aren't disabled. This can cause idle cores to enter a sleep state and can result in reduced network performance of bare metal instances. These sleep states can be disabled in the operating system if you need full network bandwidth performance.
To disable the sleep states on a bare metal instance without needing to restart the instance, use the following script:
for cpu in {0..191}; do echo "1" | sudo tee /sys/devices/system/cpu/cpu$cpu/cpuidle/state3/disable echo "1" | sudo tee /sys/devices/system/cpu/cpu$cpu/cpuidle/state2/disable done
Alternatively, you can update the GRUB configuration file to persist the changes across instance restarts.
# add intel_idle.max_cstate=1 processor.max_cstate=1 to GRUB_CMDLINE_LINUX sudo vim /etc/default/grub sudo grub2-mkconfig -o /boot/grub2/grub.cfg sudo reboot
After the reboot, verify that the C6 and C1E sleep states are disabled:
ls /sys/devices/system/cpu/cpu0/cpuidle/ state0 state1 cat /sys/devices/system/cpu/cpu0/cpuidle/state*/name POLL C1
The Input-output Memory Management Unit (IOMMU) is a CPU feature that provides address virtualization for PCI devices. IOMMU can negatively impact networking performance if there are a lot of I/O translation lookaside buffer
(IOTLB) misses.
Storage is affected by the following hardware constraints:
To understand a VM's storage performance, view performance metrics for Throughput, Operations (IOPS), I/O Size, I/O Latency, and Queue Length.
Disk throughput and IOPS indicate whether the VM workload is operating as expected. If throughput or IOPS is lower than the expected maximum listed in the disk type chart, then I/O size, queue length, or I/O latency performance issues might be present.
You can expect I/O size to be between 4-16 KiB for workloads that require high IOPS and low latency, and 256 KiB-1 MiB for workloads that involve sequential or large write sizes. I/O size outside of these ranges indicate disk performance issues.
Queue length, also known as queue depth, is a factor of throughput and IOPS. When a disk performs well, its queue length should be about the same as the queue length recommended to achieve a particular throughput or IOPS level, listed in the Recommended I/O queue depth chart.
I/O latency is dependent on queue length and I/O size. If the queue length or I/O size for a disk is high, the latency will also be high.
If any storage performance metrics indicate disk performance issues, do one or more of the following:
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-08-07 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-07 UTC."],[[["This guide details how to diagnose and address performance bottlenecks related to CPU, memory, and storage on Compute Engine VM and bare metal instances."],["Utilizing the Cloud Monitoring observability metrics via the Google Cloud console is crucial for assessing the performance of compute instances, including CPU, memory, network, and disk utilization."],["Consistently high CPU or memory utilization often indicates the need to upgrade a VM's machine type to one with more vCPUs or memory resources, and high network traffic may require a machine type with greater egress bandwidth."],["Storage performance can be optimized by reviewing persistent disk or Hyperdisk settings, adding new disks to increase performance limits, or adjusting the disk type as needed to meet the demands of your workload."],["On bare metal instances, disabling the C6 and C1E sleep states can enhance network performance, and using large pages is recommended to prevent I/O translation lookaside buffer misses, which negatively affect network performance."]]],[]]
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4