Regional Persistent Disk and Hyperdisk Balanced High Availability are storage options that let you implement high availability (HA) services in Compute Engine. Regional Persistent Disk and Hyperdisk Balanced High Availability synchronously replicate data between two zones in the same region and ensure HA for disk data for up to one zonal failure.
Regional Persistent Disk and Hyperdisk Balanced High Availability volumes are designed for workloads that require a lower Recovery Point Objective (RPO) and Recovery Time Objective (RTO). To learn more about RPO and RTO, see Basics of disaster recovery planning.
Regional Persistent Disk and Hyperdisk Balanced High Availability volumes are designed to work with regional
managed instance groups.
This document provides an overview of how to build HA services with Regional Persistent Disk and Hyperdisk Balanced High Availability volumes.
When you decide to use Regional Persistent Disk or Hyperdisk Balanced High Availability, make sure that you compare the different options for increasing service availability and the cost, performance, and resiliency for different service architectures.
About synchronous disk replicationA Regional Persistent Disk or Hyperdisk Balanced High Availability volume, also referred to as a regional disk, or synchronously replicated disk, has a primary and a secondary zone within its region where it stores disk data:
Compute Engine maintains replicas of your disk in both these zones. When you write data to your disk, Compute Engine synchronously replicates that data to the disk replicas in both zones to ensure HA. The data of each zonal replica is spread across multiple physical machines within the zone to ensure durability. Zonal replicas ensure that the data of the disk remains available and provide protection against temporary outages in one of the disk zones.
Replica state for zonal replicasDisk replica state for Regional Persistent Disk or Hyperdisk Balanced High Availability shows you the state of a zonal replica in comparison to the content of the disk. Zonal replicas for your disks are in one of the following disk replica states at all times:
To learn how to check and track the replica states of your zonal replicas, see Monitor the disk replica states.
Replication states for regional disksDepending on the state of the individual zonal replicas, your Regional Persistent Disk or Hyperdisk Balanced High Availability volume can be in one of the following replication states:
out of sync
due to a failure or an outage.If the disk replication status is catching up
or degraded
, then one of the zonal replicas is not updated with all the data. Any outage during this time in the zone of the healthy replica results in an unavailability of the disk until the healthy replica zone is restored.
When your Regional Persistent Disk or Hyperdisk Balanced High Availability volume is catching up, Google Cloud starts healing the zonal replica that is catching up. Google recommends that you wait for the affected zonal replica to catch up with the data on the disk, at which point its status changes to Synced
. After the zonal replica then moves to the synced state, the regional disk status changes back to the Fully replicated
state.
If the regional disk has a status of catching up
or degraded
for a prolonged period of time and does not meet your organization's RPO requirements, we recommend that you take snapshots of the primary replica in either of following ways:
After you create a snapshot, you can create a new Regional Persistent Disk or Hyperdisk Balanced High Availability disk by using that snapshot as the source. This restores the snapshot to the new disk. Your new disk also starts in a fully replicated state with healthy data replication.
To learn how to check the replication state of your Regional Persistent Disk or Hyperdisk Balanced High Availability disk, see Determine the replication state of disks.
Replica recovery checkpointA replica recovery checkpoint is a disk attribute that represents the most recent crash-consistent point in time of a fully replicated disk. Compute Engine automatically creates and maintains a single replica recovery checkpoint for each regional disk. When a disk is fully replicated, Compute Engine keeps refreshing its checkpoint approximately every 15 minutes to ensure that the checkpoint remains updated. When the disk replication status is degraded
, Compute Engine lets you create a standard snapshot from the replica recovery checkpoint of that disk. The resulting standard snapshot captures the data from the most recent crash-consistent version of the fully replicated disk.
In rare scenarios, when your disk is degraded, the zonal replica that is synced with the latest disk data can also fail before the out-of-sync replica catches up. You won't be able to force-attach your disk to compute instances in either zone. Your replicated disk becomes unavailable and you must migrate the data to a new disk. In such scenarios, if you don't have any existing standard snapshots available for your disk, you might still be able to recover your disk data from the incomplete replica by using a standard snapshot created from the replica recovery checkpoint.
Important: In case of an unavailable disk, Google recommends that you always use any existing standard snapshots to create a new Regional Persistent Disk or Hyperdisk Balanced High Availability volume and recover disk data. Create standard snapshots from a checkpoint only if you don't have any existing standard snapshots available.Compute Engine automatically creates replica recovery checkpoints for each mounted Regional Persistent Disk or Hyperdisk Balanced High Availability disk. You don't incur any additional charges for the creation of these checkpoints. However, you do incur any applicable storage charges for the creation of snapshots and compute instances when you use these checkpoints to migrate your regional disk to functioning zones.
Learn more about how to recover your regional disk data using a replica recovery checkpoint.
Regional disk failoverIn the event of an outage in a zone, the zone becomes inaccessible and the compute instance in that zone can't perform read or write operations on its disk. To allow the instance to keep performing read and write operations for the regional disk, Compute Engine allows migration of disk data to the other zone where the disk has a replica. This process is called failover.
The failover process involves detaching the zonal replica from the instance in the affected zone and then attaching the zonal replica to a new instance in the secondary zone. Compute Engine synchronously replicates the data on your disk to the secondary zone to ensure a quick failover in case of a single replica failure.
Failover by application-specific regional control planeThe application-specific regional control plane is not a Google Cloud service. When you design HA service architectures, you must build your own application-specific regional control plane. This application control plane decides which instance must have the regional disk attached and which instance is the current primary instance.
When a failure is detected in the primary instance or database of the regional disk, the application-specific regional control plane of your HA service architecture can automatically initiate failover to the standby instance in the secondary zone. During the failover, the application-specific regional control plane reattaches the regional disk to the standby instance in the secondary zone. Compute Engine then directs all traffic to that instance based on health check signals.
The overall failover latency, excluding failure-detection time, is the sum of the following latencies:
For more information, see Understanding the application-specific regional control plane.
The Disaster Recovery Building Blocks page covers the building blocks available on Compute Engine.
Failover by force-attachOne of the benefits of Regional Persistent Disk and Hyperdisk Balanced High Availability is that in the unlikely event of a zonal outage, you can manually failover your workload to another zone. When the original zone has an outage, you can't complete the disk detach operation until that zonal replica is restored. In this scenario, you might need to attach the secondary zonal replica to a new compute instance without detaching the primary zonal replica from your primary instance. This process is called force-attach.
When your compute instance in the primary zone becomes unavailable, you can force attach your disk to an instance in the secondary zone. To perform this task, you must do one of the following:
Compute Engine executes the force-attach operation in less than one minute. The total recovery time objective (RTO) depends not only on the storage failover (the force attachment of the regional disk), but also on other factors, including the following:
For more information about how to failover your compute instance using force-attach, see Failover your regional disk using force-attach
.
Regional Persistent Disk and Hyperdisk Balanced High Availability favor workload availability, which means there are tradeoffs for data protection in the unlikely event that both disk replicas are unavailable at the same time. For more information, see
Manage failures for regional disks.
LimitationsThe following sections list the limitations that apply for Regional Persistent Disk and Hyperdisk Balanced High Availability.
General limitations for regional disksRetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4