A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://developer.hashicorp.com/terraform/tutorials/recommended-patterns/pattern-backups below:

Terraform Enterprise backup - recommended pattern | Terraform

Many business verticals require business continuity management (BCM) for production services. A reliable backup of your Terraform Enterprise deployment is crucial to ensuring business continuity. The backup should include data held and processed by Terraform Enterprise's components so that operators can restore it within the organization's Recovery Time Objective (RTO) and to their Recovery Point Objective (RPO).

This guide extends the Backup & Restore documentation, which contains more technical detail about the backup and restore process. This guide discusses the best practices, options, and considerations to back up Terraform Enterprise and increase its resiliency. It also recommends redundant, self-healing configurations using public and private cloud infrastructure, which add resilience to your deployment and reduce the chances of requiring backups.

Most of this guide is only relevant to single-region, multi-availability zone External Services mode deployments except where otherwise stated. Refer to Backup a Mounted Disk Deployment section below for specific details if you are running a Mounted Disk deployment. This guide does not cover Demo mode backups.

For region redundancy, repeat the recommendations in this guide for each region and consider the recommendations in the Multi-Region Considerations section at the end of this page.

For recommended patterns for recovery and restoration of TFE, refer to the Terraform Enterprise Recovery & Restoration Recommended Pattern.

Business continuity (BC) is a corporate capability. This capability exists whenever organizations can continue to deliver their products and services at acceptable, predefined levels whenever disruptive incidents occur.

Note

The ISO 22301 document uses business continuity rather than disaster recovery (DR). As a result, this tutorial will refer to business continuity instead of disaster recovery.

Two factors heavily determine your organization's ability to achieve BC:

  1. Recovery Time Objective (RTO) is the target time set for the resumption of product, service, or activity delivery after an incident. For example, if an organization has an RTO of one hour, they aim to have their services running within one hour of the service disruption.

  2. Recovery Point Objective (RPO) is the maximum tolerable period that data can be lost after an incident. For example, if an organization has an RPO of one hour, they can tolerate the loss of a maximum of one hour's data before the service disruption.

Based on these definitions, you should assess the valid RTO/RPO for your business and approach BC accordingly. These factors will determine your backup frequency and other considerations discussed later in this guide.

In this guide:

Maintain the backup and restore process

When you deploy Terraform Enterprise:

Manage sensitive values

For fully automated deployments, you must manage several common sensitive values. The methods below do not back up these data and you should secure them another way. Do not store any of these sensitive values in version control or allow them to leak into shell histories.

Active/Active deployments must be automated, and have additional sensitive values you must manage.

Process audit logs

Audit log processing helps you identify the root cause during a data recovery incident.

Follow the guidance on Terraform Enterprise logs to aggregate and index logs from the Terraform Enterprise node(s) using a central logging platform such as Splunk, ELK, or a cloud-native solution. These should be used as a diagnostic tool in the event of outage, scanning them for ERROR and FATAL messages as part of root cause analysis.

Terraform Enterprise backup API

The backup API facilitates backups and migrations from one operational mode or deployment method (Standalone or Active/Active) to another.

Only use the backup API to migrate between low-volume implementations, especially in non-production environments. Use cloud-native tooling instead for day-to-day backup and recovery on public cloud, and standard approaches for on-premise deployments as detailed below.

The following recommendations will improve your security posture, reduce the effort required to maintain an optimal Terraform Enterprise instance, and speed up deployment time during a restoration.

Note

The Automated Recovery function only backs up installation data and not application data. If you have an automated deployment, you don't need to use the Automated Recovery function.

Reference the tab(s) below for specific recommendations relevant to your installation method.

If you are using the online installation method, configure the boot script to run the Replicated install.sh script explicitly without the airgap argument when the new VM starts up. The VM will download the installation media from the Internet and install the service.

Based on the Replicated configuration, the application will connect to the configured object store and database resources automatically.

If you are using the air-gapped installation method, use one of the following ways to ensure the installation media is available to the install configuration.

We recommend you automatically replace application server nodes when a node or availability zone fails. Replacing the node provides redundancy at the server and availability zone level. Public clouds and VMware have specific services for this.

Click on the tab(s) below relevant to your cloud deployment for additional cloud-specific recommendations.

Use an Auto Scaling group (ASG) to automatically replace nodes on AWS. Select your deployment for more details.

Use a zone-balanced Linux virtual machine scale set (VMSS) to automatically replace nodes on Azure. Select your deployment for more details.

Use a regional managed instance group (MIG) to automatically replace nodes on GCP. Select your deployment for more details.

In an External Services mode scenario, the application server is running as a stateless node.

We recommend the following to support the object store's business continuity:

Click on the tab(s) below relevant to your cloud deployment for additional cloud-specific recommendations.

The most likely problem with the object store is service inaccessibility or corruption through human error rather than loss of durability, due to AWS's claim of eleven 9s of durability.

As a result, S3 Same-Region Replication is not explicitly required for the Terraform Enterprise object store because it does not add sufficient value: corruption on the primary S3 bucket will be replicated to the secondary automatically.

We recommend the following to ensure you back up your application data appropriately.

The most likely problem with the object store is service inaccessibility or corruption through human error rather than loss of durability, due to Azure's claim of eleven 9s of durability.

We recommend the following to ensure you back up your application data appropriately.

The most likely problem with the object store is service inaccessibility or corruption through human error rather than loss of durability, due to GCP's claim of eleven 9s of durability.

We recommend the following to ensure you back up your application data appropriately.

For on-premise External Services deployments, as the architectural requirements include an S3-compatible storage facility, such as minIO or Dell ECS:

You should configure the database to be in line with Terraform Enterprise's PostgreSQL requirements.

For high availability in a single public cloud region, we recommend deploying the database in a multi-availability zone configuration to add resilience against recoverable outages. For coverage against non-recoverable issues (such as data corruption), take regular snapshots of the database.

Click on the tab(s) below relevant to your cloud deployment for additional cloud-specific recommendations.

In addition to the general recommendations above, consider the following AWS-specific recommendations:

In addition to the general recommendations above, consider the following Azure-specific recommendations:

In addition to the general recommendations above, consider the following GCP-specific recommendations:

In addition to the general recommendations above, consider the following VMware-specific recommendations:

We understand that customers with private clouds are likely to have an established backup policy for databases already, possibly including a software partnership with a recognized backup vendor. In this case, for External Services mode deployments, we recommend you use these existing practices and tooling.

We make these additional recommendations for database backups:

This section is only relevant if you are running an Active/Active deployment.

Because the Redis instance serves as an active memory cache for Terraform Enterprise, you don't need to maintain backups. However, we recommend you ensure regional availability to protect against zone failure.

Note

Enabling Redis RDB backups may be unnecessary due to the ephemeral nature of the data in the cache at any given time.

Click on the tab(s) below relevant to your cloud deployment for additional cloud-specific recommendations.

AWS has a significant number of business continuity configuration options for Redis.

If you use Terraform to deploy Terraform Enterprise, refer to AWS ElastiCache section of the Active/Active deployment guide for an example Redis configuration.

Your aws_elasticache_replication_group.tfe resource should look similar to the one found below. This configuration is for a Redis (cluster mode disabled) cluster of three nodes, one in each availability zone to confer n-2 zone redundancy.

resource "aws_elasticache_replication_group" "tfe" {
  ## ...

  num_cache_clusters = 3
  preferred_cache_cluster_azs = [var.availability_zones]
  multi_az_enabled = true
  automatic_failover_enabled = true
}

Note

You should set the preferred_cache_cluster_azs argument to a list of availability zones equal to the number of cluster nodes. The first availability zone in the list will be the primary zone for the cluster. Duplicates are allowed.

Note

The setup will increase cost, so you should be mindful when setting up your Redis clusters. Setting a minimum of two cache clusters with the above configuration will ensure failover capability.

Azure Cache for Redis has built-in high availability.

If you use Terraform to deploy Terraform Enterprise, refer to the Azure Cache for Redis section of the Active/Active deployment guide.

Your azurerm_redis_cache.tfe resource should look similar to the one found below. This configuration is for a Redis (cluster mode disabled) cluster of three nodes, one in each availability zone to confer n-2 zone redundancy.

resource "azurerm_redis_cache" "tfe" {
  ## ...

  capacity  = 3
  family    = "P"
  sku_name  = "Premium"
}

Note

The Azure Premium tier is currently available in preview.

Note

The setup will increase cost, so you should be mindful when setting up your Redis clusters. Setting a minimum of two cache clusters with the above configuration will ensure failover capability.

The Standard Tier of the GCP Memorystore for Redis service provides high availability through replication and automatic failover capability. However, this tier provides only a second node, which provides an n-1 zone redundancy. The Standard Tier is currently the highest.

If you use Terraform to deploy Terraform Enterprise, refer to the GCP Memorystore for Redis section of the Active/Active deployment guide for an example configuration.

Active/Active deployment is unavailable for VMWare.

Terraform Enterprise's application architecture is currently single-region. The additional configuration should be for business continuity purposes only and not for cross-region, Active/Active capability. Support for the below would be on a best-endeavors basis only. In addition, cross-region functionality on every application tier is not supported in every region. Check support as part of architectural planning.

Generally, we recommend you repeat the recommendations in this guide for each region to achieve region redundancy in a Terraform Enterprise deployment.

Note

Cross-region deployments incur additional hosting costs.

Recommendations common to the most-used cloud vendors include:

Click on the tab(s) below relevant to your cloud deployment for additional cloud-specific recommendations.

The following additional considerations provide an n-1 region redundancy on AWS. Since both cross-region S3 replication and Aurora read replicas can provide replicas in multiple Secondary regions, it is possible to offer greater than n-1 region redundancy if required.

The following additional considerations will provide an n-1 region redundancy on Azure:

The following additional considerations will provide an n-1 region redundancy on GCP:

Since this guide refers to multiple availability zones and maps these zones to separate VMware datacenters, multi-region deployments require connected datacenters in different countries or continents.

Repeat the recommendations in this guide for each region and use the strategic connections between regions to migrate Terraform Enterprise workloads during outages. The key concepts are to ensure:

The backup approach for a Mounted Disk operational mode is simpler than for External Services mode because it involves a single machine and possibly its business continuity instance. Also, a Mounted Disk deployment backup ensures the integrity of the machine and its attached data disk.

We recommend using Mounted Disk mode when provisioning on private cloud if the added complexity of managing an on-premise database and S3-compatible storage are not readily supported in your environment. In the event of an eventual move to the Active/Active deployment mode, supporting these external services with the addition of Redis services will be required.

We do not recommend using Mounted Disk deployments on public cloud since External Services mode provides better scalability and Mounted Disk mode does not support Active/Active deployments. For Twelve Factor compliance, use the same operational mode for both production and non-production.

Ensure to quiesce the database on Mounted Disk instances — your backup software may or may not do this automatically.

Mounted Disk mode uses a separate mountable volume (data disk) that can come in many flavors. To ensure data integrity, ensure the mountable volume has the following capabilities (in this order):

Click on the tab(s) below relevant to your cloud deployment for additional cloud-specific recommendations.

AWS has recommended backup/snapshot options to back up a Mounted Disk deployment.

Azure has recommended backup/snapshot options to back up a Mounted Disk deployment.

GCP has recommended backup/snapshot options to back up a Mounted Disk deployment.

For on-premise Mounted Disk mode deployments, refer to the Application Server VMware tab above for recommendations for server backup.

In addition:

Note

Do not start more than one Mounted Disk mode instance against the same database simultaneously. If you are using a load balancer and a warm server with the data disk visible in the other datacenter, ensure Terraform Enterprise is not running on it while the primary is.

In this guide, you learned best practices for preparing and backing up Terraform Enterprise's main components.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4