The Get Started guide describes how to deploy a Nomad environment with minimal infrastructure configuration. It also allows you to quickly develop, test, deploy, and iterate on your application.
When you are ready to move from your local machine, these tutorials guide you through deploying a Nomad cluster with access control lists (ACLs) enabled on the three major cloud platforms: AWS, GCP, and Azure. This gives you the flexibility to leverage all of the features available in Nomad such as CSI volumes, service discovery integration, and job constraints.
The code and configuration files for each cloud provider are in their own directory in the example repository. This tutorial will cover the contents of the repository at a high level which is the configuration of the Nomad cluster. The tutorials will then guide you through deploying and provisioning a Nomad cluster on the specific cloud platform of your choice.
The cluster design follows best practices outlined in the reference architecture including a three server setup for high availability, using Consul for automatic clustering and service discovery, and making sure there is low network latency between the nodes.
Nomad's ACL system is enabled to control data and API access and provides a minimal amount of permission to the default client token, restricting any administrative rights by default. This client token is generated during the cluster setup and provided to the user for their interactions with Nomad instead of the management token.
Finally, the security group setup allows free communication between the nodes of the cluster and limits external ingress to only the necessary UI ports as outlined in the extensibility notes.
Review repository contentsThe root level of the repository contains a directory for each cloud and a shared
directory that contains configuration files common to all of the clouds.
The shared/config
directory contains configuration files for starting the Nomad and Consul agents as well as the policy files for configuring ACLs.
nomad-acl-user.hcl
is the Nomad ACL policy file that gives the user token the permissions to read and submit jobs.
nomad.hcl
and nomad_client.hcl
are the Nomad agent startup files for the server and client nodes, respectively. They are used to configure the Nomad agent started by the nomad.service
file via systemd
. The agent files contain capitalized placeholder strings that are replaced with actual values during the provisioning process.
shared/config/nomad.hcl
data_dir = "/opt/nomad/data"
bind_addr = "0.0.0.0"
# Enable the server
server {
enabled = true
bootstrap_expect = SERVER_COUNT
}
consul {
address = "127.0.0.1:8500"
token = "CONSUL_TOKEN"
}
acl {
enabled = true
}
## ...
Consul files
consul-acl-nomad-auto-join.hcl
is the Consul ACL policy file that gives the Nomad agent token the necessary permissions to automatically join the Consul cluster during startup.
consul-template.hcl
and consul-template.service
are used to configure and start the Consul Template service.
consul.hcl
and consul_client.hcl
are the Consul agent startup files for the server and client nodes, respectively. They are used to configure the Consul agent started by the consul_aws.service
, consul_gce.service
, or consul_azure.service
files via systemd
, depending on the cloud platform. Like the Nomad agent files, these also contain capitalized placeholder strings that are replaced with actual values during the provisioning process.
/shared/config/consul.hcl
data_dir = "/opt/consul/data"
bind_addr = "0.0.0.0"
client_addr = "0.0.0.0"
advertise_addr = "IP_ADDRESS"
bootstrap_expect = SERVER_COUNT
acl {
enabled = true
default_policy = "deny"
down_policy = "extend-cache"
}
log_level = "INFO"
server = true
ui = true
retry_join = ["RETRY_JOIN"]
## ...
The shared/scripts
directory contains scripts for installing, configuring, and starting Nomad and Consul on the deployed infrastructure.
setup.sh
downloads and installs Nomad, Consul, Consul Template, and their dependencies.
server.sh
and client.sh
replace the capitalized placeholder strings in the server and client agent startup files with actual values, copies the systemd
service files to the correct location and starts them, and configures Docker networking.
The data-scripts
directory contains user-data-server.sh
which bootstraps the Consul ACLs, the Nomad ACLs, and then saves the Nomad bootstrap user token temporarily in the Consul KV store. It also contains user-data-client.sh
which runs the shared/scipts/client.sh
script from above and restarts Nomad.
Tip
Terraform adds the nomad_consul_token_secret
value to the configuration during the provisioning process so that it's available for the script to replace at runtime.
shared/data-scripts/user-data-client.sh
#!/bin/bash
set -e
exec > >(sudo tee /var/log/user-data.log|logger -t user-data -s 2>/dev/console) 2>&1
sudo bash /ops/shared/scripts/client.sh "${cloud_env}" '${retry_join}' "${nomad_binary}"
NOMAD_HCL_PATH="/etc/nomad.d/nomad.hcl"
CLOUD_ENV="${cloud_env}"
sed -i "s/CONSUL_TOKEN/${nomad_consul_token_secret}/g" $NOMAD_HCL_PATH
# ...
Explore the cloud directories
The root level aws
, gcp
, and azure
directories contain several common components that have been configured to work with a specific cloud platform.
variables.hcl.example
is the variables file used for both Packer and Terraform via the -var-file
flag.
Example Packer command using -var-file
$ packer build -var-file=variables.hcl image.pkr.hcl
Example Terraform command using -var-file
$ terraform apply -var-file=variables.hcl
image.pkr.hcl
is the Packer build file used to create the machine image for the cluster nodes. This also runs the shared/scripts.setup.sh
script.
main.tf
, outputs.tf
, variables.tf
, and versions.tf
contain the Terraform configurations to provision the cluster.
By default, the cluster consists of 3 server and 3 client nodes and uses the Consul auto-join functionality to automatically add nodes as they start up and become available. The value for retry_join
found in the consul.hcl
and consul_client.hcl
agent template files comes from Terraform during provisioning and differs somewhat between the three cloud platforms.
shared/config/consul_client.hcl
ui = true
log_level = "INFO"
data_dir = "/opt/consul/data"
bind_addr = "0.0.0.0"
client_addr = "0.0.0.0"
advertise_addr = "IP_ADDRESS"
retry_join = ["RETRY_JOIN"]
In each scenario, Terraform substitutes the retry_join
value into either the user-data-server.sh
or user-data-client.sh
scripts with the templatefile()
function in main.tf
.
Cloud Auto-join for AWS EC2 does not require any project specific information so the value is set as a default in the variables file. The values for tag_key
and tag_value
are read by Consul as a key-value pair of "ConsulAutoJoin" = "auto-join"
.
aws/variables.tf
# ...
variable "retry_join" {
description = "Used by Consul to automatically form a cluster."
type = string
default = "provider=aws tag_key=ConsulAutoJoin tag_value=auto-join"
}
# ...
A tag is set in the aws_instance
resource for each server and client that matches the key-value pair in the retry_join
variable.
aws/main.tf
resource "aws_instance" "server" {
# ...
# instance tags
# ConsulAutoJoin is necessary for nodes to automatically join the cluster
tags = merge(
{
"Name" = "${var.name}-server-${count.index}"
},
{
"ConsulAutoJoin" = "auto-join"
},
{
"NomadType" = "server"
}
)
# ...
}
The value is then read by Terraform during provisioning for both the server and client nodes.
aws/main.tf
resource "aws_instance" "server" {
# ...
user_data = templatefile("../shared/data-scripts/user-data-server.sh", {
server_count = var.server_count
region = var.region
cloud_env = "aws"
retry_join = var.retry_join
nomad_binary = var.nomad_binary
nomad_consul_token_id = random_uuid.nomad_id.result
nomad_consul_token_secret = random_uuid.nomad_token.result
})
# ...
}
Cloud Auto-join for GCP requires that the retry_join
value contains the GCP project ID. This variable must be updated with the project ID before running Terraform. zone_pattern
restricts the auto-join to a specific zone for faster discovery.
gcp/variables.hcl.example
# ...
# Terraform variables (all are required)
retry_join = "project_name=GCP_PROJECT_ID zone_pattern=GCP_ZONE provider=gce tag_value=auto-join"
# ...
The google_compute_instance
resources for the server and client nodes contain an auto-join
instance tag that matches the value in the retry_join
variable. This variable is read by Terraform during provisioning for both the server and client nodes.
gcp/main.tf
resource "google_compute_instance" "server" {
# ...
tags = ["auto-join"]
# ...
metadata_startup_script = templatefile("../shared/data-scripts/user-data-server.sh", {
server_count = var.server_count
region = var.region
cloud_env = "gce"
retry_join = var.retry_join
nomad_binary = var.nomad_binary
nomad_consul_token_id = var.nomad_consul_token_id
nomad_consul_token_secret = var.nomad_consul_token_secret
})
}
Cloud Auto-join for Azure requires that the retry_join
value contains the Subscription ID, Tenant ID, Client ID, and Client Secret. This variable must be updated with the Azure project values before running Terraform.
azure/variables.hcl.example
# ...
# Terraform variables (all are required)
retry_join = "provider=azure
tag_name=ConsulAutoJoin
tag_value=auto-join
subscription_id=SUBSCRIPTION_ID
tenant_id=TENANT_ID
client_id=CLIENT_ID
secret_access_key=CLIENT_SECRET"
# ...
The instance key-value tag that auto-join will use is set in the value of retry_join
as "ConsulAutoJoin" = "auto-join"
. The tag is set in the azurerm_network_interface
resources for the servers and clients.
azure/main.tf
resource "azurerm_network_interface" "hashistack-server-ni" {
# ...
tags = {"ConsulAutoJoin" = "auto-join"}
}
The value is then read by Terraform during provisioning for both the server and client nodes. Note that the azurerm_linux_virtual_machine
resource contains the reference to the azurerm_network_interface
resource with the auto-join tag.
azure/main.tf
resource "azurerm_linux_virtual_machine" "server" {
# ...
network_interface_ids = ["${element(azurerm_network_interface.hashistack-server-ni.*.id, count.index)}"]
size = "${var.server_instance_type}"
count = "${var.server_count}"
# ...
custom_data = "${base64encode(templatefile("../shared/data-scripts/user-data-server.sh", {
region = var.location
cloud_env = "azure"
server_count = "${var.server_count}"
retry_join = var.retry_join
nomad_binary = var.nomad_binary
nomad_consul_token_id = var.nomad_consul_token_id
nomad_consul_token_secret = var.nomad_consul_token_secret
}))}"
}
main.tf
also adds the startup scripts from shared/data-scripts
to the server and client nodes during provisioning and places the actual values specified in variables.hcl
to those startup scripts.
post-script.sh
gets the temporary Nomad bootstrap user token from the Consul KV store, saves it locally, and then deletes it from the Consul KV store.
The cluster setup in the following tutorials includes the minimum amount of configuration that is required for the cluster to operate.
Once setup is complete, the Consul UI will be accessible on port 8500
, the Nomad UI on port 4646
, and SSH to each node on port 22
. Security groups implementing this configuration are in main.tf
for each cloud in the root of their respective folders. They allow access from IP addresses specified by the CIDR range in the allowlist_ip
variable of the variables.hcl
file in the same directory.
To test out your applications running in the cluster, you will need to create additional security group rules that allow access to ports used by your application. Each scenario's main.tf
file contains an example showing how to configure the rules.
The AWS scenario contains a security group named client_ingress
where you can place your application rules.
aws/main.tf
resource "aws_security_group" "clients_ingress" {
name = "${var.name}-clients-ingress"
vpc_id = data.aws_vpc.default.id
# ...
# Add application ingress rules here
# These rules are applied only to the client nodes
# nginx example
ingress {
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
}
The aws_instance
resource for the clients contain the clients_ingress
security group and attaches your application rules to the client instances with this group.
aws/main.tf
resource "aws_instance" "client" {
ami = var.ami
instance_type = var.client_instance_type
key_name = var.key_name
vpc_security_group_ids = [
aws_security_group.consul_nomad_ui_ingress.id,
aws_security_group.ssh_ingress.id,
aws_security_group.clients_ingress.id,
aws_security_group.allow_all_internal.id
]
count = var.client_count
# ...
}
The GCP scenario contains a firewall named client_ingress
where you can place your application rules.
gcp/main.tf
resource "google_compute_firewall" "clients_ingress" {
name = "${var.name}-clients-ingress"
network = google_compute_network.hashistack.name
source_ranges = [var.allowlist_ip]
target_tags = ["nomad-clients"]
# Add application ingress rules here
# These rules are applied only to the client nodes
# nginx example; replace with your application port
allow {
protocol = "tcp"
ports = [80]
}
}
The application rules are applied to the nodes with network tags. The client nodes have a nomad-clients
tag that matches the one in the target_tags
attribute of the google_compute_firewall
resource.
gcp/main.tf
resource "google_compute_instance" "client" {
count = var.client_count
name = "${var.name}-client-${count.index}"
machine_type = var.client_instance_type
zone = var.zone
tags = ["auto-join", "nomad-clients"]
# ...
}
The Azure scenario contains a security rule named client_ingress
where you can place your application rules. The application rules are applied by adding each client node's IP address to the destination_address_prefixes
attribute.
azure/main.tf
resource "azurerm_network_security_rule" "clients_ingress" {
name = "${var.name}-clients-ingress"
resource_group_name = "${azurerm_resource_group.hashistack.name}"
network_security_group_name = "${azurerm_network_security_group.hashistack-sg.name}"
priority = 110
direction = "Inbound"
access = "Allow"
protocol = "Tcp"
# Add application ingress rules here
# These rules are applied only to the client nodes
# nginx example; replace with your application port
source_address_prefix = var.allowlist_ip
source_port_range = "*"
destination_port_range = "80"
destination_address_prefixes = azurerm_linux_virtual_machine.client[*].public_ip_address
}
Now that you have reviewed the cluster setup repository and learned how the cluster is configured, continue on to the cluster setup tutorials for each of the major cloud platforms to provision and configure your Nomad cluster.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4