Google Cloud provides images that contain common operating systems, frameworks, libraries, and drivers. Google Cloud optimizes these pre-configured images to support your artificial intelligence (AI) and machine learning (ML) workloads.
This document provides an overview of the images that you use to deploy, manage, and run workloads in your AI Hypercomputer environment.
Understand the image categoriesImages are grouped into the following categories:
Google Cloud provides Docker images that package popular AI and ML frameworks and libraries. These images provide the software needed to simplify the development, training, and deployment of models on your AI-optimized clusters running on AI Hypercomputer.
JAX AI imagesThe JAX AI Images (JAII, formerly known as JAX Stable Stack Images) for Google Cloud TPUs and GPUs offer ready-to-use Docker images that contain the JAX framework, a curated collection of compatible libraries, and settings for the Google Cloud infrastructure. JAX AI TPU images come pre-configured with JAX libraries and TPU libraries. JAX AI GPU images come pre-configured with JAX libraries and relevant CUDA/NVIDIA libraries.
The hardware layerJAX AI images sit on top of the hardware layer which consists of the accelerators (TPU or GPU) and their associated VMs. In order to use a JAX AI image, you need to provision TPU or GPU VMs. You can do this using the TPU API, Compute Engine API, or the GKE API.
The framework layerThe framework layer provides tools and libraries for building ML workloads. JAX AI images provide a pre-configured base for JAX-based ML workloads, including the core JAX library and other essential dependencies, to ensure a consistent and high-performance development experience.
The LibTPU layer in the JAX AI image is specifically built and bundled with the respective JAX version. Using a different JAX version may lead to unexpected behavior or errors.
The CUDA layer in the JAX AI image includes components managed by NVIDIA, such as the NGC CUDA Deep Learning Image, used as the base image of the GPU Training Image. The GPU image also contains Transformer Engine, a custom NVIDIA library for accelerating transformer models on NVIDIA GPUs.
Additional application-specific packages, beyond those provided in the JAX AI image, may be required for your specific machine learning workload.
Libraries in JAX AI Images: Current JAX AI images The application layerYou implement your specific ML workloads in the application layer which sits on top of the framework layer. The application layer contains your application specific code, models, and logic, all built using the tools and libraries provided by the framework layer.
While this image provides a robust and well-tested foundation for JAX-based AI workloads, you may need to add application-specific dependencies. When doing so, we recommended that you do so in a way that minimizes interference with the pre-configured base layer, which includes JAX and its core dependencies. Introducing application-level dependencies that override or conflict with the existing dependencies can cause side-effects such as:
Initially, JAX AI images are provided quarterly with a near-term goal of a synchronized release schedule with every JAX release. This ensures that you can benefit from the latest features and improvements as soon as they are available.
SupportEach JAX AI image release adheres to a limited-time support lifecycle. Within this timeframe, we address specific categories of requests for modifications to existing JAX AI images:
When a security vulnerability or bug is discovered in a library within a JAII, we incorporate the updated library into JAII, pinning all other library versions to maintain overall stability. This results in a new JAII revision.
Minimal change for revisions:
If a bug is found in package "X" within JAX-0.4.30-rev1, we'll update "X" to its next release (for example, v2.0) while trying to keep keeping all other packages unchanged. This results in a new revision: JAX-0.4.30-rev2, which will be released as quickly as possible.
Deep Learning Software Layer (DLSL) Docker imagesThese images package NVIDIA CUDA, NCCL, an ML framework, and a model. They provide a ready-to-use environment for deep learning workloads. These prebuilt DLSL Docker images work seamlessly with your GKE clusters because we test and verify these images during internal reproducibility and regression testing.
Note: If you want to use DLSL images, you must provision an AI-optimized GKE cluster first.DLSL Docker images provide the following benefits:
These Docker images are based on the NVIDIA NeMo NGC image. They contain Google's NCCL gIB plugin and bundle all NCCL binaries required to run workloads on each supported accelerator machine. These images also include Google Cloud tools such as gcsfuse
and gcloud CLI for deploying workloads to Google Kubernetes Engine.
nemo25.04-gib1.0.6-A4
NeMo NGC:25.04.01
NCCL giB plugin: 1.0.6
us-central1-docker.pkg.dev/deeplearning-images/reproducibility/pytorch-gpu-nemo-nccl:nemo25.04-gib1.0.6-A4
nemo25.04-gib1.0.6-A3U
NeMo NGC:25.04.01
NCCL giB plugin: 1.0.6
us-central1-docker.pkg.dev/deeplearning-images/reproducibility/pytorch-gpu-nemo-nccl:nemo25.04-gib1.0.6-A3U
nemo25.02-gib1.0.5-A4
NeMo NGC:25.02
NCCL giB plugin: 1.0.5
us-central1-docker.pkg.dev/deeplearning-images/reproducibility/pytorch-gpu-nemo-nccl:nemo25.02-gib1.0.5-A4
nemo24.07-gib1.0.2-A3U
NeMo NGC:24.07
NCCL giB plugin: 1.0.2
us-central1-docker.pkg.dev/deeplearning-images/reproducibility/pytorch-gpu-nemo-nccl:nemo24.07-gib1.0.2-A3U
nemo24.07-gib1.0.3-A3U
NeMo NGC:24.07
NCCL giB plugin: 1.0.3
us-central1-docker.pkg.dev/deeplearning-images/reproducibility/pytorch-gpu-nemo-nccl:nemo24.07-gib1.0.3-A3U
nemo24.12-gib1.0.3-A3U
NeMo NGC:24.12
NCCL giB plugin: 1.0.3
us-central1-docker.pkg.dev/deeplearning-images/reproducibility/pytorch-gpu-nemo-nccl:nemo24.12-gib1.0.3-A3U
nemo24.07-tcpx1.0.5-A3Mega
NeMo NGC:24.07
GPUDirect-TCPX: 1.0.5
us-central1-docker.pkg.dev/deeplearning-images/reproducibility/pytorch-gpu-nemo-nccl:nemo24.07-tcpx1.0.5-A3Mega
NeMo + PyTorch
This Docker image is based on the NVIDIA NeMo NGC image and includes Google Cloud tools such as gcsfuse
and gcloud CLI for deploying workloads to Google Kubernetes Engine.
nemo24.07--A3U
NeMo NGC:24.07
A3 Ultra December 19, 2024 December 19, 2025 us-central1-docker.pkg.dev/deeplearning-images/reproducibility/pytorch-gpu-nemo:nemo24.07-A3U
MaxText + JAX toolbox
This Docker image is based on the NVIDIA JAX toolbox image and includes Google Cloud tools such as gcsfuse
and gcloud CLI for deploying workloads to Google Kubernetes Engine.
toolbox-maxtext-2025-01-10-A3U
JAX toolbox: maxtext-2025-01-10
A3 Ultra March 11, 2025 March 11, 2026 us-central1-docker.pkg.dev/deeplearning-images/reproducibility/jax-maxtext-gpu:toolbox-maxtext-2025-01-10-A3U
MaxText + JAX stable stack
This Docker image is based on the JAX stable stack and MaxText. This image also includes dependencies such as dnsutils
for running workloads on Google Kubernetes Engine.
jax-maxtext-gpu:jax0.5.1-cuda_dl25.02-rev1-maxtext-20150317
JAX Stable stacks:jax0.5.1-cuda_dl25.02-rev1
54e98c9e62caa426cf5902be068533ddb4fb79f5
us-central1-docker.pkg.dev/deeplearning-images/reproducibility/jax-maxtext-gpu:jax0.5.1-cuda_dl25.02-rev1-maxtext-20150317
Cluster deployment and orchestration
OS images include all the necessary software components to deploy an operating system on a Compute Engine virtual machine instance or GKE node. The operating system manages underlying hardware resources, such as accelerators and networking. This provides the compute resources for your AI workload.
GKE node imagesGKE deploys clusters using node images. These node images are available for various operating systems such as Container-Optimized OS, Ubuntu, and Windows Server. The Container-Optimized OS with containerd (cos_containerd) node images that you need to deploy GKE Autopilot clusters include optimizations to support your AI and ML workloads.
For more information about these node images, see Node images.
Slurm OS imagesSlurm clusters deploy compute and controller nodes as virtual machine instances on Compute Engine.
To provision AI-optimized Slurm clusters, you must use Cluster Toolkit. During Slurm cluster deployment, the cluster blueprint automatically builds a custom OS image that installs the required system software for cluster and workload management on the Slurm nodes. You can modify the default blueprints before you deploy them to customize some of the software that your images include.
The following section summarizes the software that the cluster blueprint installs on your A4 and A3 Ultra Slurm nodes. Cluster blueprints extend the Ubuntu LTS Accelerator OS images.
A4The A4 blueprint available on GitHub includes the following software by default:
munge
mariadb
libjwt
lmod
dcgmi
)nvidia-utils-570
nvidia-container-toolkit
libnvidia-nscq-570
ibverbs-utils
The A3 Ultra blueprint available on GitHub includes the following software by default:
munge
mariadb
libjwt
lmod
dcgmi
)libnvidia-cfg1-570-server
libnvidia-nscq-570
nvidia-compute-utils-570-server
nsight-compute
nsight-systems
ibverbs-utils
AI Hypercomputer lets you provision individual instances or groups of instances. If you want to create these instances, you must specify an OS image during instance creation.
Google Cloud offers a suite of OS images for instance creation. Google Cloud also offers a specialized set of accelerator OS images for AI-optimized instances. These OS images include core drivers for GPU and networking functionality, such as NVIDIA drivers, Mellanox drivers, and their dependencies.
For more information about each OS, see the Operating system details page in the Compute Engine documentation.
Accelerator OS images are available for the Rocky Linux and Ubuntu LTS operating systems.
Rocky Linux acceleratorThe following Rocky Linux accelerator OS images are available for each machine series:
OS version Image family Machine series Image project Rocky Linux 9 acceleratorrocky-linux-9-optimized-gcp-nvidia-570
A4, A3 Ultra rocky-linux-accelerator-cloud
Rocky Linux 8 accelerator rocky-linux-8-optimized-gcp-nvidia-570
A4, A3 Ultra rocky-linux-accelerator-cloud
Ubuntu LTS accelerator
The following Ubuntu LTS accelerator OS images are available for each machine series:
OS version Image family Architecture Machine series Image project Ubuntu 24.04 LTS acceleratorubuntu-accelerator-2404-arm64-with-nvidia-570
Arm A4X ubuntu-os-accelerator-images
ubuntu-accelerator-2404-amd64-with-nvidia-570
x86 A4, A3 Ultra ubuntu-os-accelerator-images
Ubuntu 22.04 LTS accelerator ubuntu-accelerator-2204-arm64-with-nvidia-570
Arm A4X ubuntu-os-accelerator-images
ubuntu-accelerator-2204-amd64-with-nvidia-570
x86 A4, A3 Ultra ubuntu-os-accelerator-images
What's next
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4