This page explains how to improve DNS lookup latency in a Google Kubernetes Engine (GKE) cluster by using NodeLocal DNSCache.
For GKE Autopilot clusters, NodeLocal DNSCache is enabled by default and cannot be overridden.
ArchitectureNodeLocal DNSCache is a GKE add-on that you can run in addition to kube-dns.
GKE implements NodeLocal DNSCache as a DaemonSet that runs a DNS cache on each node in your cluster.
When a Pod makes a DNS request, the request goes to the DNS cache running on the same node as the Pod. If the cache can't resolve the DNS request, the cache forwards the request to one of the following places based on the query destination:
cluster.local
) are forwarded to kube-dns. The node-local-dns Pods use the kube-dns-upstream Service to access kube-dns Pods. In the following diagram, the IP address of the kube-dns Service is 10.0.0.10:53
.When you enable NodeLocal DNSCache on an existing cluster, GKE recreates all cluster nodes running GKE version 1.15 and later according to the node upgrade process.
After GKE recreates the nodes, GKE automatically adds the label addon.gke.io/node-local-dns-ds-ready=true
to the nodes. You must not add this label to the cluster nodes manually.
NodeLocal DNSCache provides the following benefits:
upstreamServers
and stubDomains
using TCP and UDP on GKE versions 1.18 or later. The DNS server must be reachable using TCP and UDP.NXDOMAIN
.hostNetwork
Pod or configure a hostPorts
with those ports, NodeLocal DNSCache fails and DNS errors occur. NodeLocal DNSCache Pods don't use hostNetwork
mode when using GKE Dataplane V2 and not using Cloud DNS for GKE.For Autopilot clusters, NodeLocal DNSCache is enabled by default and cannot be overridden.
For Standard clusters, you can enable NodeLocal DNSCache on new or existing clusters using the Google Cloud CLI. You can enable NodeLocal DNSCache in new clusters using the Google Cloud console.
Enable NodeLocal DNSCache in a new cluster gcloudTo enable NodeLocal DNSCache in a new cluster, use the --addons
flag with the argument NodeLocalDNS
:
gcloud container clusters create CLUSTER_NAME \
--location=COMPUTE_LOCATION \
--addons=NodeLocalDNS
Replace the following:
CLUSTER_NAME
: the name of your new cluster.COMPUTE_LOCATION
: the Compute Engine location for the cluster.To enable NodeLocal DNSCache on a new cluster, use the following steps:
Go to the Google Kubernetes Engine page in the Google Cloud console.
Next to Standard, click Configure.
Configure your cluster how you want.
From the navigation pane, click Networking.
In the Advanced networking options section, select the Enable NodeLocal DNSCache checkbox.
Click Create.
To enable NodeLocal DNSCache in an existing cluster, use the --update-addons
flag with the argument NodeLocalDNS=ENABLED
:
gcloud container clusters update CLUSTER_NAME \
--location=COMPUTE_LOCATION \
--update-addons=NodeLocalDNS=ENABLED
Replace the following:
CLUSTER_NAME
: the name of your cluster.COMPUTE_LOCATION
: the Compute Engine location for the cluster.This change requires recreating the nodes, which can cause disruption to your running workloads. For details about this specific change, find the corresponding row in the manual changes that recreate the nodes using a node upgrade strategy and respecting maintenance policies table. To learn more about node updates, see Planning for node update disruptions.
Important: GKE respects maintenance policies when recreating the nodes for this change using the node upgrade strategy, and depends on resource availability. Disabling node auto-upgrades doesn't prevent this change. To manually apply the changes to the nodes, use the gcloud CLI to call thegcloud container clusters upgrade
command, passing the --cluster-version
flag with the same GKE version that the node pool is already running. Verify that NodeLocal DNSCache is enabled
You can verify that NodeLocal DNSCache is running by listing the node-local-dns
Pods:
kubectl get pods -n kube-system -o wide | grep node-local-dns
The output is similar to the following:
node-local-dns-869mt 1/1 Running 0 6m24s 10.128.0.35 gke-test-pool-69efb6b8-5d7m <none> <none>
node-local-dns-htx4w 1/1 Running 0 6m24s 10.128.0.36 gke-test-pool-69efb6b8-wssk <none> <none>
node-local-dns-v5njk 1/1 Running 0 6m24s 10.128.0.33 gke-test-pool-69efb6b8-bhz3 <none> <none>
The output shows a node-local-dns
Pod for each node that is running GKE version 1.15 or later.
You can disable NodeLocal DNSCache using the following command:
gcloud container clusters update CLUSTER_NAME \
--location=COMPUTE_LOCATION \
--update-addons=NodeLocalDNS=DISABLED
Replace the following:
CLUSTER_NAME
: the name of the cluster to disable.COMPUTE_LOCATION
: the Compute Engine location for the cluster.This change requires recreating the nodes, which can cause disruption to your running workloads. For details about this specific change, find the corresponding row in the manual changes that recreate the nodes using a node upgrade strategy and respecting maintenance policies table. To learn more about node updates, see Planning for node update disruptions.
Important: GKE respects maintenance policies when recreating the nodes for this change using the node upgrade strategy, and depends on resource availability. Disabling node auto-upgrades doesn't prevent this change. To manually apply the changes to the nodes, use the gcloud CLI to call thegcloud container clusters upgrade
command, passing the --cluster-version
flag with the same GKE version that the node pool is already running. Troubleshoot NodeLocal DNSCache
For general information about diagnosing Kubernetes DNS issues, see Debugging DNS Resolution.
NodeLocal DNSCache is not enabled immediatelyWhen you enable NodeLocal DNSCache on an existing cluster, GKE might not update the nodes immediately if the cluster has a configured maintenance window or exclusion. For more information, see caveats for node re-creation and maintenance windows.
If you prefer not to wait, you can manually apply the changes to the nodes by calling the gcloud container clusters upgrade
command and passing the --cluster-version
flag with the same GKE version that the node pool is already running. You must use the Google Cloud CLI for this workaround.
If you use NodeLocal DNSCache with Cloud DNS, the cluster uses the name server IP address 169.254.20.10
, as shown in the following diagram:
As a result, the IP address of the kube-dns
Service might be different than the name server IP address that your Pods use. This difference in IP addresses is expected, because the 169.254.20.10
name server IP address is required for Cloud DNS to work correctly.
To check the IP addresses, run the following commands:
View the IP address of the kube-dns
Service:
kubectl get svc -n kube-system kube-dns -o jsonpath="{.spec.clusterIP}"
The output is the IP address of kube-dns
, like 10.0.0.10:53
Open a shell session in your Pod:
kubectl exec -it POD_NAME -- /bin/bash
In the Pod shell session, read the contents of the /etc/resolv.conf
file:
cat /etc/resolv.conf
The output is 169.254.20.10
If you use network policy with NodeLocal DNSCache and you are not using Cloud DNS or GKE Dataplane V2, you must configure rules to permit your workloads and the node-local-dns
Pods to send DNS queries.
Use an ipBlock
rule in your manifest to allow communication between your Pods and kube-dns.
The following manifest describes a network policy that uses an ipBlock
rule:
spec:
egress:
- ports:
- port: 53
protocol: TCP
- port: 53
protocol: UDP
to:
- ipBlock:
cidr: KUBE_DNS_SVC_CLUSTER_IP/32
podSelector: {}
policyTypes:
- Egress
Replace KUBE_DNS_SVC_CLUSTER_IP
with the IP address of the kube-dns service. You can get the IP address of the kube-dns service using the following command:
kubectl get svc -n kube-system kube-dns -o jsonpath="{.spec.clusterIP}"
Known issues DNS timeout in ClusterFirstWithHostNet dnsPolicy when using NodeLocal DNSCache and GKE Dataplane V2
On clusters using GKE Dataplane V2 and NodeLocal DNSCache, pods with hostNetwork
set to true
and dnsPolicy
set to ClusterFirstWithHostNet
cannot reach cluster DNS backends. DNS logs might contain entries similar to the following:
nslookup: write to 'a.b.c.d': Operation not permitted
;; connection timed out; no servers could be reached
The output indicates that the DNS requests cannot reach the backend servers.
A workaround is to set the dnsPolicy
and dnsConfig
for hostNetwork
pods:
spec:
dnsPolicy: "None"
dnsConfig:
nameservers:
- KUBE_DNS_UPSTREAM
searches:
- cluster.local
- svc.cluster.local
- NAMESPACE.svc.cluster.local
- c.PROJECT_ID.internal
- google.internal
options:
- name: ndots
value: "5"
Replace the following:
NAMESPACE
: the namespace of the hostNetwork
pod.PROJECT_ID
: the ID of your Google Cloud project.KUBE_DNS_UPSTREAM
: the ClusterIP of the upstream kube-dns service. You can get this value using the following command:
kubectl get svc -n kube-system kube-dns-upstream -o jsonpath="{.spec.clusterIP}"
DNS requests from the Pod can now reach kube-dns and bypass NodeLocal DNSCache.
NodeLocal DNSCache timeout errorsOn clusters with NodeLocal DNSCache enabled, the logs might contain entries similar to the following:
[ERROR] plugin/errors: 2 <hostname> A: read tcp <node IP: port>-><kubedns IP>:53: i/o timeout
Note: If the output contains 169.254.169.254:53 i/o timeout
, the timeouts are coming from the metadata server, and the following workarounds don't apply.
The output includes the IP address of the kube-dns-upstream
Cluster IP Service. In this example, the response to a DNS request was not received from kube-dns in 2 seconds. This could be due to one of the following reasons:
As a result, the existing kube-dns
pods are unable to handle all requests in time. The workaround is to increase the number of kube-dns replicas by tuning the auto scaling parameters.
You can use a lower value for nodesPerReplica
to ensure that more kube-dns Pods are created as cluster nodes scale up. We highly recommend setting an explicit max
value to ensure that the GKE control plane virtual machine (VM) is not overwhelmed due to large number of kube-dns pods watching the Kubernetes API.
You can set max
to the number of nodes in the cluster. If the cluster has more than 500 nodes, set max
to 500.
max
to 500 does not create 500 replicas. Instead, it ensures that kube-dns replicas don't scale up beyond the value of max
. In most cases, you should set max
to a value much lower than 500.
For Standard clusters, you can modify the number of kube-dns replicas by editing the kube-dns-autoscaler
ConfigMap. This configuration is not supported in Autopilot clusters.
kubectl edit configmap kube-dns-autoscaler --namespace=kube-system
The output is similar to the following:
linear: '{"coresPerReplica":256, "nodesPerReplica":16,"preventSinglePointFailure":true}'
The number of kube-dns replicas is calculated using the following formula:
replicas = max( ceil( cores × 1/coresPerReplica ) , ceil( nodes × 1/nodesPerReplica ), maxValue )
To scale up, change nodesPerReplica
to a smaller value and include a max
value.
linear: '{"coresPerReplica":256, "nodesPerReplica":8,"max": 15,"preventSinglePointFailure":true}'
The config creates 1 kube-dns pod for every 8 nodes in the cluster. A 24-node cluster will have 3 replicas and a 40-node cluster will have 5 replicas. If the cluster grows beyond 120 nodes, the number of kube-dns replicas does not grow beyond 15, the max
value.
To ensure a baseline level of DNS availability in your cluster, set a minimum replica count for kube-dns.
The kube-dns-autoscaler
ConfigMap output with min
field would be similar to the following:
linear: '{"coresPerReplica":256, "nodesPerReplica":8,"max": 15,"min": 5,"preventSinglePointFailure":true}'
What's next
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4