Stay organized with collections Save and categorize content based on your preferences.
Cluster Director (formerly known as Hypercompute Cluster) lets you deploy and manage a group of accelerators as a single unit with physically colocated VMs, targeted workload placement, advanced cluster maintenance controls, and topology-aware scheduling. Cluster Director can be accessed directly through Compute Engine APIs, or through Google Kubernetes Engine, which natively integrates with Cluster Director capabilities.
ComponentsThis section describes the core features and services that make up the Cluster Director suite.
Dense colocation of accelerator resourcesYou can request host machines that are allocated physically close to each other, provisioned as blocks of resources, and are interconnected with a dynamic ML network fabric. This arrangement of resources helps to minimize network hops and optimize for the lowest latency.
To learn how to deploy these densely allocated blocks of A3 Ultra or A4 accelerator machines, see Reserve capacity.
Topology aware schedulingYou can get topology information at the node and cluster levels that can be used for job placement. For more information, see View VMs topology.
Advanced maintenance scheduling and controlsYou have full control over the maintenance of VM instances within a block of resources, and can synchronize upgrades to ensure your workloads are more resilient to host errors and have minimal disruptions. This approach improves the goodput for your workloads.
To facilitate full control of maintenance events, you can set up alerts and receive notifications when maintenance is scheduled, starting, or being completed. To learn more about maintenance of these blocks of resources, see the following:
You can also define how you want maintenance to behave for your blocks of resources. You can choose between the following maintenance scheduling types: grouped or independent. To learn more about maintenance scheduling types, see Maintenance scheduling types.
Monitoring and diagnostic toolingFor monitoring and troubleshooting, Cluster Director includes services such as the faulty host reporting, which you can use to flag issues with individual host machines. To help to reduce the overhead of managing your cluster, services are also available for monitoring network and GPU performance.
Supported machine typesCluster Director supports the following accelerator-optimized machine types:
What's next?Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-08-07 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-07 UTC."],[],[]]
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4