A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://cloud.google.com/compute/docs/gpus below:

GPU machine types | Compute Engine

This document outlines the NVIDIA GPU models available on Compute Engine, which you can use to accelerate machine learning (ML), data processing, and graphics-intensive workloads on your virtual machine (VM) instances. This document also details which GPUs come pre-attached to accelerator-optimized machine series such as A4X, A4, A3, A2, and G2, and which GPUs you can attach to N1 general-purpose instances.

Use this document to compare the performance, memory, and features of different GPU models. For a more detailed overview of the accelerator-optimized machine family, including information on CPU platforms, storage options, and networking capabilities, and to find the specific machine type that matches your workload, see Accelerator-optimized machine family.

For more information about GPUs on Compute Engine, see About GPUs.

To view available regions and zones for GPUs on Compute Engine, see GPUs regions and zone availability.

GPU models available

The following GPU models are available with the specified machine type to support your AI, ML, and HPC workloads. If you have graphics-intensive workloads, such as 3D visualization, you can also create virtual workstations that use NVIDIA RTX Virtual Workstations (vWS). NVIDIA RTX Virtual Workstation is available for some GPU models. When you create an instance that use NVIDIA RTX Virtual Workstation, Compute Engine automatically adds a vWS license. For information about pricing for virtual workstations, see GPU pricing page.

For the A and G series accelerator-optimized machine types, the specified GPU model automatically attaches to the instance. For the N1 general-purpose machine types, you can attach the GPU models specified.

Machine type GPU model NVIDIA RTX Virtual Workstation (vWS) model A4X NVIDIA GB200 Grace Blackwell Superchips (nvidia-gb200).

Each Superchip contains four NVIDIA B200 Blackwell GPUs.

A4 NVIDIA B200 Blackwell GPUs (nvidia-b200) A3 Ultra NVIDIA H200 SXM GPUs (nvidia-h200-141gb) A3 Mega
NVIDIA H100 SXM GPUs (nvidia-h100-mega-80gb) A3 High and
A3 Edge NVIDIA H100 SXM GPUs (nvidia-h100-80gb) A2 Ultra NVIDIA A100 80GB GPUs (nvidia-a100-80gb) A2 Standard NVIDIA A100 40GB GPUs (nvidia-a100-40gb) G4 (Preview) NVIDIA RTX PRO 6000 Blackwell Server Edition (nvidia-rtx-pro-6000) G2 NVIDIA L4 (nvidia-l4) NVIDIA L4 Virtual Workstations (vWS) (nvidia-l4-vws) N1 NVIDIA T4 GPUs (nvidia-tesla-t4) NVIDIA T4 Virtual Workstations (vWS) (nvidia-tesla-t4-vws) NVIDIA P4 GPUs (nvidia-tesla-p4) NVIDIA P4 Virtual Workstations (vWS) (nvidia-tesla-p4-vws) NVIDIA V100 GPUs (nvidia-tesla-v100) NVIDIA P100 GPUs (nvidia-tesla-p100) NVIDIA P100 Virtual Workstations (vWS) (nvidia-tesla-p100-vws)

You can also use some GPU machine types on AI Hypercomputer. AI Hypercomputer is a supercomputing system that is optimized to support your artificial intelligence (AI) and machine learning (ML) workloads. This option is recommended for creating a densely allocated, performance-optimized infrastructure that has integrations for Google Kubernetes Engine (GKE) and Slurm schedulers.

A4X machine series

A4X accelerator-optimized machine types use NVIDIA GB200 Grace Blackwell Superchips (nvidia-gb200) and are ideal for foundation model training and serving.

A4X is an exascale platform based on NVIDIA GB200 NVL72. Each machine has two sockets with NVIDIA Grace CPUs with Arm Neoverse V2 cores. These CPUs are connected to four NVIDIA B200 Blackwell GPUs with fast chip-to-chip (NVLink-C2C) communication.

Tip: When provisioning A4X instances, you must reserve capacity to create instances and cluster. You can then create instances that use the features and services available from AI Hypercomputer. For more information, see Deployment options overview in the AI Hypercomputer documentation. Attached NVIDIA GB200 Grace Blackwell Superchips Machine type vCPU count1 Instance memory (GB) Attached Local SSD (GiB) Physical NIC count Maximum network bandwidth (Gbps)2 GPU count GPU memory3
(GB HBM3e) a4x-highgpu-4g 140 884 12,000 6 2,000 4 720

1A vCPU is implemented as a single hardware hyper-thread on one of the available CPU platforms.
2Maximum egress bandwidth cannot exceed the number given. Actual egress bandwidth depends on the destination IP address and other factors. For more information about network bandwidth, see Network bandwidth.
3GPU memory is the memory on a GPU device that can be used for temporary storage of data. It is separate from the instance's memory and is specifically designed to handle the higher bandwidth demands of your graphics-intensive workloads.

A4 machine series

A4 accelerator-optimized machine types have NVIDIA B200 Blackwell GPUs (nvidia-b200) attached and are ideal for foundation model training and serving.

Tip: When provisioning A4 machine types, you must reserve capacity to create instances or clusters, use Spot VMs, use Flex-start VMs, or create a resize request in a MIG. For instructions on how to create A4 instances, see Create an A3 Ultra or A4 instance. . Attached NVIDIA B200 Blackwell GPUs Machine type vCPU count1 Instance memory (GB) Attached Local SSD (GiB) Physical NIC count Maximum network bandwidth (Gbps)2 GPU count GPU memory3
(GB HBM3e) a4-highgpu-8g 224 3,968 12,000 10 3,600 8 1,440

1A vCPU is implemented as a single hardware hyper-thread on one of the available CPU platforms.
2Maximum egress bandwidth cannot exceed the number given. Actual egress bandwidth depends on the destination IP address and other factors. For more information about network bandwidth, see Network bandwidth.
3GPU memory is the memory on a GPU device that can be used for temporary storage of data. It is separate from the instance's memory and is specifically designed to handle the higher bandwidth demands of your graphics-intensive workloads.

A3 machine series

A3 accelerator-optimized machine types have NVIDIA H100 SXM or NVIDIA H200 SXM GPUs attached.

A3 Ultra machine type

A3 Ultra machine types have NVIDIA H200 SXM GPUs (nvidia-h200-141gb) attached and provides the highest network performance in the A3 series. A3 Ultra machine types are ideal for foundation model training and serving.

Tip: When provisioning A3 Ultra machine types, you must reserve capacity to create instances or clusters, use Spot VMs, use Flex-start VMs, or create a resize request in a MIG. For more information about the parameters to set when creating an A3 Ultra instance, see Create an A3 Ultra or A4 instance. Attached NVIDIA H200 GPUs Machine type vCPU count1 Instance memory (GB) Attached Local SSD (GiB) Physical NIC count Maximum network bandwidth (Gbps)2 GPU count GPU memory3
(GB HBM3e) a3-ultragpu-8g 224 2,952 12,000 10 3,600 8 1128

1A vCPU is implemented as a single hardware hyper-thread on one of the available CPU platforms.
2Maximum egress bandwidth cannot exceed the number given. Actual egress bandwidth depends on the destination IP address and other factors. For more information about network bandwidth, see Network bandwidth.
3GPU memory is the memory on a GPU device that can be used for temporary storage of data. It is separate from the instance's memory and is specifically designed to handle the higher bandwidth demands of your graphics-intensive workloads.

A3 Mega, High, and Edge machine types

To use NVIDIA H100 SXM GPUs, you have the following options:

A3 Mega Tip: When provisioning a3-megagpu-8g machine types, we recommend using a cluster of these instances and deploying with a scheduler such as Google Kubernetes Engine (GKE) or Slurm. For detailed instructions on either of these options, review the following: Attached NVIDIA H100 GPUs Machine type vCPU count1 Instance memory (GB) Attached Local SSD (GiB) Physical NIC count Maximum network bandwidth (Gbps)2 GPU count GPU memory3
(GB HBM3) a3-megagpu-8g 208 1,872 6,000 9 1,800 8 640 A3 High Tip: When provisioning a3-highgpu-1g, a3-highgpu-2g, or a3-highgpu-4g machine types, you must create instances by using Spot VMs or Flex-start VMs. For detailed instructions on these options, review the following: Attached NVIDIA H100 GPUs Machine type vCPU count1 Instance memory (GB) Attached Local SSD (GiB) Physical NIC count Maximum network bandwidth (Gbps)2 GPU count GPU memory3
(GB HBM3) a3-highgpu-1g 26 234 750 1 25 1 80 a3-highgpu-2g 52 468 1,500 1 50 2 160 a3-highgpu-4g 104 936 3,000 1 100 4 320 a3-highgpu-8g 208 1,872 6,000 5 1,000 8 640 A3 Edge Tip: To get started with A3 Edge instances, see Create an A3 VM with GPUDirect-TCPX enabled. Attached NVIDIA H100 GPUs Machine type vCPU count1 Instance memory (GB) Attached Local SSD (GiB) Physical NIC count Maximum network bandwidth (Gbps)2 GPU count GPU memory3
(GB HBM3) a3-edgegpu-8g 208 1,872 6,000 5 8 640

1A vCPU is implemented as a single hardware hyper-thread on one of the available CPU platforms.
2Maximum egress bandwidth cannot exceed the number given. Actual egress bandwidth depends on the destination IP address and other factors. For more information about network bandwidth, see Network bandwidth.
3GPU memory is the memory on a GPU device that can be used for temporary storage of data. It is separate from the instance's memory and is specifically designed to handle the higher bandwidth demands of your graphics-intensive workloads.

A2 machine series

A2 accelerator-optimized machine types have NVIDIA A100 GPUs attached and are ideal for model fine tuning, large model and cost optimized inference.

A2 machine series are available in two types:

A2 Ultra Attached NVIDIA A100 80GB GPUs Machine type vCPU count1 Instance memory (GB) Attached Local SSD (GiB) Maximum network bandwidth (Gbps)2 GPU count GPU memory3
(GB HBM2e) a2-ultragpu-1g 12 170 375 24 1 80 a2-ultragpu-2g 24 340 750 32 2 160 a2-ultragpu-4g 48 680 1,500 50 4 320 a2-ultragpu-8g 96 1,360 3,000 100 8 640 A2 Standard Attached NVIDIA A100 40GB GPUs Machine type vCPU count1 Instance memory (GB) Local SSD supported Maximum network bandwidth (Gbps)2 GPU count GPU memory3
(GB HBM2) a2-highgpu-1g 12 85 Yes 24 1 40 a2-highgpu-2g 24 170 Yes 32 2 80 a2-highgpu-4g 48 340 Yes 50 4 160 a2-highgpu-8g 96 680 Yes 100 8 320 a2-megagpu-16g 96 1,360 Yes 100 16 640

1A vCPU is implemented as a single hardware hyper-thread on one of the available CPU platforms.
2Maximum egress bandwidth cannot exceed the number given. Actual egress bandwidth depends on the destination IP address and other factors. For more information about network bandwidth, see Network bandwidth.
3GPU memory is the memory on a GPU device that can be used for temporary storage of data. It is separate from the instance's memory and is specifically designed to handle the higher bandwidth demands of your graphics-intensive workloads.

G4 machine series

Preview

This feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of the Service Specific Terms. Pre-GA features are available "as is" and might have limited support. For more information, see the launch stage descriptions.

G4 accelerator-optimized machine types use NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs (nvidia-rtx-pro-6000) and are suitable for NVIDIA Omniverse simulation workloads, graphics-intensive applications, video transcoding, and virtual desktops. G4 machine types also provide a low-cost solution for performing single host inference and model tuning compared with A series machine types.

A key feature of the G4 series is support for direct GPU peer-to-peer (P2P) communication on multi-GPU machine types (g4-standard-96, g4-standard-192, g4-standard-384). This allows GPUs within the same instance to exchange data directly over the PCIe bus, without involving the CPU host. For more information about G4 GPU peer-to-peer communication, see G4 GPU peer-to-peer communication.

Important: For information on how to get started with G4 machine types, contact your Google account team. Attached NVIDIA RTX PRO 6000 GPUs Machine type vCPU count1 Instance memory (GB) Maximum Titanium SSD supported (GiB)2 Physical NIC count Maximum network bandwidth (Gbps)3 GPU count GPU memory4
(GB GDDR7) g4-standard-48 48 180 1,500 1 50 1 96 g4-standard-96 96 360 3,000 1 100 2 192 g4-standard-192 192 720 6,000 1 200 4 384 g4-standard-384 384 1,440 12,000 2 400 8 768

1A vCPU is implemented as a single hardware hyper-thread on one of the available CPU platforms.
2You can add Titanium SSD disks when creating a G4 instance. For the number of disks you can attach, see Machine types that require you to choose a number of Local SSD disks.
3Maximum egress bandwidth cannot exceed the number given. Actual egress bandwidth depends on the destination IP address and other factors. See Network bandwidth.
4GPU memory is the memory on a GPU device that can be used for temporary storage of data. It is separate from the instance's memory and is specifically designed to handle the higher bandwidth demands of your graphics-intensive workloads.

G2 machine series

G2 accelerator-optimized machine types have NVIDIA L4 GPUs attached and are ideal for cost-optimized inference, graphics-intensive and high performance computing workloads.

Each G2 machine type also has a default memory and a custom memory range. The custom memory range defines the amount of memory that you can allocate to your instance for each machine type. You can also add Local SSD disks when creating a G2 instance. For the number of disks you can attach, see Machine types that require you to choose a number of Local SSD disks.

Attached NVIDIA L4 GPUs Machine type vCPU count1 Default instance memory (GB) Custom instance memory range (GB) Max Local SSD supported (GiB) Maximum network bandwidth (Gbps)2 GPU count GPU memory3 (GB GDDR6) g2-standard-4 4 16 16 to 32 375 10 1 24 g2-standard-8 8 32 32 to 54 375 16 1 24 g2-standard-12 12 48 48 to 54 375 16 1 24 g2-standard-16 16 64 54 to 64 375 32 1 24 g2-standard-24 24 96 96 to 108 750 32 2 48 g2-standard-32 32 128 96 to 128 375 32 1 24 g2-standard-48 48 192 192 to 216 1,500 50 4 96 g2-standard-96 96 384 384 to 432 3,000 100 8 192

1A vCPU is implemented as a single hardware hyper-thread on one of the available CPU platforms.
2Maximum egress bandwidth cannot exceed the number given. Actual egress bandwidth depends on the destination IP address and other factors. For more information about network bandwidth, see Network bandwidth.
3GPU memory is the memory on a GPU device that can be used for temporary storage of data. It is separate from the instance's memory and is specifically designed to handle the higher bandwidth demands of your graphics-intensive workloads.

N1 machine series

You can attach the following GPU models to an N1 machine type with the exception of the N1 shared-core machine types.

Unlike the machine types in the accelerator-optimized machine series, N1 machine types don't come with a set number of attached GPUs. Instead, you specify the number of GPUs to attach when creating the instance.

N1 instances with fewer GPUs limit the maximum number of vCPUs. In general, a higher number of GPUs lets you create instances with a higher number of vCPUs and memory.

N1+T4 GPUs

You can attach NVIDIA T4 GPUs to N1 general-purpose instances with the following instance configurations.

Accelerator type GPU count GPU memory1 (GB GDDR6) vCPU count Instance memory (GB) Local SSD supported nvidia-tesla-t4 or
nvidia-tesla-t4-vws 1 16 1 to 48 1 to 312 Yes 2 32 1 to 48 1 to 312 Yes 4 64 1 to 96 1 to 624 Yes

1GPU memory is the memory available on a GPU device that you can use for temporary data storage. It is separate from the instance's memory and is specifically designed to handle the higher bandwidth demands of your graphics-intensive workloads.

N1+P4 GPUs

You can attach NVIDIA P4 GPUs to N1 general-purpose instances with the following instance configurations.

Accelerator type GPU count GPU memory1 (GB GDDR5) vCPU count Instance memory (GB) Local SSD supported2 nvidia-tesla-p4 or
nvidia-tesla-p4-vws 1 8 1 to 24 1 to 156 Yes 2 16 1 to 48 1 to 312 Yes 4 32 1 to 96 1 to 624 Yes

1GPU memory is the memory that is available on a GPU device that you can use for temporary data storage. It is separate from the instance's memory and is specifically designed to handle the higher bandwidth demands of your graphics-intensive workloads.
2For instances with attached NVIDIA P4 GPUs, Local SSD disks are only supported in zones us-central1-c and northamerica-northeast1-b.

N1+V100 GPUs

You can attach NVIDIA V100 GPUs to N1 general-purpose instances with the following instance configurations.

Accelerator type GPU count GPU memory1 (GB HBM2) vCPU count Instance memory (GB) Local SSD supported2 nvidia-tesla-v100 1 16 1 to 12 1 to 78 Yes 2 32 1 to 24 1 to 156 Yes 4 64 1 to 48 1 to 312 Yes 8 128 1 to 96 1 to 624 Yes

1GPU memory is the memory available on a GPU device that you can use for temporary data storage. It is separate from the instance's memory and is specifically designed to handle the higher bandwidth demands of your graphics-intensive workloads.
2For instances with attached NVIDIA V100 GPUs, Local SSD disks aren't supported in us-east1-c.

N1+P100 GPUs

You can attach NVIDIA P100 GPUs to N1 general-purpose instances with the following instance configurations.

For some NVIDIA P100 GPUs, the maximum CPU and memory available for some configurations depends on the zone in which the GPU resource runs.

Accelerator type GPU count GPU memory1 (GB HBM2) Zone vCPU count Instance memory (GB) Local SSD supported nvidia-tesla-p100 or
nvidia-tesla-p100-vws 1 16 All P100 zones 1 to 16 1 to 104 Yes 2 32 All P100 zones 1 to 32 1 to 208 Yes 4 64 us-east1-c,
europe-west1-d,
europe-west1-b 1 to 64 1 to 208 Yes All other P100 zones 1 to 96 1 to 624 Yes

1GPU memory is the memory available on a GPU device that you can use for temporary data storage. It is separate from the instance's memory and is specifically designed to handle the higher bandwidth demands of your graphics-intensive workloads.

General comparison chart

The following table describes the GPU memory size, feature availability, and ideal workload types of different GPU models that are available on Compute Engine.

GPU model GPU memory Interconnect NVIDIA RTX Virtual Workstation (vWS) support Best used for GB200 180 GB HBM3e @ 8 TBps NVLink Full Mesh @ 1,800 GBps Large-scale distributed training and inference of LLMs, Recommenders, HPC B200 180 GB HBM3e @ 8 TBps NVLink Full Mesh @ 1,800 GBps Large-scale distributed training and inference of LLMs, Recommenders, HPC H200 141 GB HBM3e @ 4.8 TBps NVLink Full Mesh @ 900 GBps Large models with massive data tables for ML Training, Inference, HPC, BERT, DLRM H100 80 GB HBM3 @ 3.35 TBps NVLink Full Mesh @ 900 GBps Large models with massive data tables for ML Training, Inference, HPC, BERT, DLRM A100 80GB 80 GB HBM2e @ 1.9 TBps NVLink Full Mesh @ 600 GBps Large models with massive data tables for ML Training, Inference, HPC, BERT, DLRM A100 40GB 40 GB HBM2 @ 1.6 TBps NVLink Full Mesh @ 600 GBps ML Training, Inference, HPC RTX PRO 6000 (Preview) 96 GB GDDR7 with ECC @ 1597 GBps N/A ML Inference, Training, Remote Visualization Workstations, Video Transcoding, HPC L4 24 GB GDDR6 @ 300 GBps N/A ML Inference, Training, Remote Visualization Workstations, Video Transcoding, HPC T4 16 GB GDDR6 @ 320 GBps N/A ML Inference, Training, Remote Visualization Workstations, Video Transcoding V100 16 GB HBM2 @ 900 GBps NVLink Ring @ 300 GBps ML Training, Inference, HPC P4 8 GB GDDR5 @ 192 GBps N/A Remote Visualization Workstations, ML Inference, and Video Transcoding P100 16 GB HBM2 @ 732 GBps N/A ML Training, Inference, HPC, Remote Visualization Workstations

To compare GPU pricing for the different GPU models and regions that are available on Compute Engine, see GPU pricing.

Performance comparison chart

The following table describes the performance specifications of different GPU models that are available on Compute Engine.

Compute performance GPU model FP64 FP32 FP16 INT8 GB200 90 TFLOPS 180 TFLOPS B200 40 TFLOPS 80 TFLOPS H200 34 TFLOPS 67 TFLOPS H100 34 TFLOPS 67 TFLOPS A100 80GB 9.7 TFLOPS 19.5 TFLOPS A100 40GB 9.7 TFLOPS 19.5 TFLOPS L4 0.5 TFLOPS1 30.3 TFLOPS T4 0.25 TFLOPS1 8.1 TFLOPS V100 7.8 TFLOPS 15.7 TFLOPS P4 0.2 TFLOPS1 5.5 TFLOPS 22 TOPS2 P100 4.7 TFLOPS 9.3 TFLOPS 18.7 TFLOPS

1To allow FP64 code to work correctly, the T4, L4, and P4 GPU architecture includes a small number of FP64 hardware units.
2TeraOperations per Second.

Tensor core performance GPU model FP64 TF32 Mixed-precision FP16/FP32 INT8 INT4 FP8 GB200 90 TFLOPS 2,500 TFLOPS2 5,000 TFLOPS1, 2 10,000 TFLOPS2 20,000 TFLOPS2 10,000 TFLOPS2 B200 40 TFLOPS 1,100 TFLOPS2 4,500 TFLOPS1, 2 9,000 TFLOPS2 9,000 TFLOPS2 H200 67 TFLOPS 989 TFLOPS2 1,979 TFLOPS1, 2 3,958 TOPS2 3,958 TFLOPS2 H100 67 TFLOPS 989 TFLOPS2 1,979 TFLOPS1, 2 3,958 TOPS2 3,958 TFLOPS2 A100 80GB 19.5 TFLOPS 156 TFLOPS 312 TFLOPS1 624 TOPS 1248 TOPS A100 40GB 19.5 TFLOPS 156 TFLOPS 312 TFLOPS1 624 TOPS 1248 TOPS L4 120 TFLOPS2 242 TFLOPS1, 2 485 TOPS2 485 TFLOPS2 T4 65 TFLOPS 130 TOPS 260 TOPS V100 125 TFLOPS P4 P100

1For mixed precision training, NVIDIA GB200, B200, H200, H100, A100, and L4 GPUs also support the bfloat16 data type.
2NVIDIA GB200, B200, H200, H100, and L4 GPUs support structural sparsity. You can use structural sparsity to double the performance of your models. The values that are documented apply when using structured sparsity. If you aren't using structured sparsity, the values are halved.

What's next?

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.5