A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://developer.hashicorp.com/nomad/docs/configuration/server below:

server Block in Agent Configuration | Nomad

This page provides reference information for configuring Nomad agent server mode in the server block of a Nomad agent configuration. Server mode lets the agent participate in scheduling decisions, register with service discovery, and handle join failures. Configure bootstrapping, authoritative region, redundancy zone, data directory, Nomad cluster behavior, client heartbeat period, schedulers, garbage collection, Raft and Raft's BoltDB store, OIDC for workload identity, and leader plan rejection, as well as job priority, job source content size, and tracked job versions.

server {
  enabled          = true
  bootstrap_expect = 3
  server_join {
    retry_join     = [ "1.1.1.1", "2.2.2.2" ]
    retry_max      = 3
    retry_interval = "15s"
  }
}
Deprecated Parameters plan_rejection_tracker Parameters

The leader plan rejection tracker can be adjusted to prevent evaluations from getting stuck due to always being scheduled to a client that may have an unexpected issue. Refer to Monitoring Nomad for more details.

If you observe too many false positives (clients being marked as ineligible even if they don't present any problem) you may want to increase node_threshold.

Or if you are noticing jobs not being scheduled due to plan rejections for the same node_id and the client is not being set as ineligible you can try increasing the node_window so more historical rejections are taken into account.

Common Setup

This example shows a common Nomad agent server configuration block. The two IP addresses could also be DNS, and should point to the other Nomad servers in the cluster

server {
  enabled          = true
  bootstrap_expect = 3

  server_join {
    retry_join     = [ "1.1.1.1", "2.2.2.2" ]
    retry_max      = 3
    retry_interval = "15s"
  }
}
Configuring Data Directory

This example shows configuring a custom data directory for the server data.

server {
  data_dir = "/opt/nomad/server"
}
Automatic Bootstrapping

The Nomad servers can automatically bootstrap if Consul is configured. For a more detailed explanation, refer to the automatic Nomad bootstrapping documentation.

Restricting Schedulers

This example shows restricting the schedulers that are enabled as well as the maximum number of cores to utilize when participating in scheduling decisions:

server {
  enabled            = true
  enabled_schedulers = ["batch", "service"]
  num_schedulers     = 7
}
Bootstrapping with a Custom Scheduler Config

While bootstrapping a cluster, you can use the default_scheduler_config block to prime the cluster with a SchedulerConfig. The scheduler configuration determines which scheduling algorithm is configured— spread scheduling or binpacking—and which job types are eligible for preemption.

Warning: Once the cluster is bootstrapped, you must configure this using the update scheduler configuration API. This option is only consulted during bootstrap.

The structure matches the Update Scheduler Config API endpoint, which you should consult for canonical documentation. However, the attributes names must be adapted to HCL syntax by using snake case representations rather than camel case.

This example shows configuring spread scheduling and enabling preemption for all job-type schedulers.

server {
  default_scheduler_config {
    scheduler_algorithm             = "spread"
    memory_oversubscription_enabled = true
    reject_job_registration         = false
    pause_eval_broker               = false

    preemption_config {
      batch_scheduler_enabled    = true
      system_scheduler_enabled   = true
      service_scheduler_enabled  = true
      sysbatch_scheduler_enabled = true
    }
  }
}

This is an advanced topic. It is most beneficial to clusters over 1,000 nodes or with unreliable networks or nodes (eg some edge deployments).

Nomad Clients periodically heartbeat to Nomad Servers to confirm they are operating as expected. Nomad Clients which do not heartbeat in the specified amount of time are considered down and their allocations are marked as lost or disconnected (if disconnect.lost_after is set) and replaced.

The various heartbeat related parameters allow you to tune the following tradeoffs:

While Nomad Clients can connect to any Server, all heartbeats are forwarded to the leader for processing. Since this heartbeat processing consumes resources, Nomad adjusts the rate at which Clients heartbeat based on cluster size. The goal is to try to keep the resource cost of processing heartbeats constant regardless of cluster size.

The base formula for determining how often a Client must heartbeat is:

<number of Clients> / <max_heartbeats_per_second>

Other factors modify this base TTL:

For example, given the default values for heartbeat parameters, different sized clusters will use the following TTLs for the heartbeats. Note that the Server TTL simply adds the heartbeat_grace parameter to the TTL Clients are given.

Clients Client TTL Server TTL Safe after elections 10 10s - 20s 20s - 30s yes 100 10s - 20s 20s - 30s yes 1000 20s - 40s 30s - 50s yes 5000 100s - 200s 110s - 210s yes 10000 200s - 400s 210s - 410s NO

Regardless of size, all clients will have a Server TTL of failover_heartbeat_ttl after a leader election. It should always be larger than the maximum Client TTL for your cluster size in order to prevent marking live Clients as down.

For clusters over 5000 Clients you should increase failover_heartbeat_ttl using the following formula:

(2 * (<number of Clients> / <max_heartbeats_per_second>)) + (10 * <min_heartbeat_ttl>)
 
# For example with 6000 Clients:
(2 * (6000 / 50)) + (10 * 10) = 340s (5m40s)

This ensures Clients have some additional time to failover even if they were told to heartbeat after the maximum interval.

The actual value used should take into consideration how much tolerance your system has for a delay in noticing crashed Clients. For example a failover_heartbeat_ttl of 30 minutes may give even the slowest clients in the largest clusters ample time to heartbeat after an election. However if the election was due to a datacenter-wide failure affecting Clients, it will be 30 minutes before Nomad recognizes that they are down and replaces their work.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4