P2
DNS Resolution Failure — Troubleshooting Guide
Diagnose DNS resolution failures on Linux systems and Kubernetes clusters. Covers resolver config, upstream DNS, systemd-resolved, CoreDNS, and DNSSEC issues.
10 min8 steps
Progress: 0/8 steps
0%
Check if the system can resolve any domain names.
dig google.com +short && echo '---' && dig FAILING_DOMAIN +short
Expected: If google.com resolves but your domain doesn't, the issue is domain-specific (DNS record, propagation, DNSSEC). If neither resolves, it's a resolver issue.
Verify which DNS servers the system is configured to use.
cat /etc/resolv.conf && echo '---' && resolvectl status 2>/dev/null | head -20 || systemd-resolve --status 2>/dev/null | head -20
Expected: Shows nameserver entries. Common issue: pointing to 127.0.0.53 (systemd-resolved) which may not be running, or to an unreachable server.
Bypass local resolver and query public DNS directly to isolate the issue.
dig @8.8.8.8 FAILING_DOMAIN +short && echo '---' && dig @1.1.1.1 FAILING_DOMAIN +short
Expected: If public resolvers work, the problem is with your local DNS config or local resolver. If they also fail, the domain may not exist.
On modern Linux (Ubuntu 18+), systemd-resolved manages DNS. If it's down, resolution breaks.
systemctl status systemd-resolved && echo '---' && resolvectl query FAILING_DOMAIN 2>&1
Expected: Service should be 'active (running)'. If inactive, start it: systemctl start systemd-resolved
DNSSEC errors cause SERVFAIL responses even though the domain exists.
dig FAILING_DOMAIN +dnssec +cd && echo '---' && dig FAILING_DOMAIN +trace | tail -20
Expected: If +cd (checking disabled) returns results but normal query doesn't, it's a DNSSEC validation failure. Check the domain's DNSSEC configuration.
In K8s clusters, CoreDNS handles service discovery and DNS resolution.
kubectl get pods -n kube-system -l k8s-app=kube-dns -o wide && echo '---' && kubectl logs -n kube-system -l k8s-app=kube-dns --tail=20
Expected: CoreDNS pods should be Running. Logs may show upstream errors, SERVFAIL, or loop detection.
Run a DNS test from within the cluster network to check CoreDNS.
kubectl run dns-test --image=busybox:1.36 --restart=Never --rm -it -- nslookup kubernetes.default.svc.cluster.local
Expected: Should resolve to the Kubernetes API server ClusterIP (usually 10.96.0.1). If this fails, CoreDNS or kube-dns service is broken.
Clear cached DNS entries and test resolution again.
# systemd-resolved: resolvectl flush-caches && resolvectl statistics # nscd (if running): nscd -i hosts 2>/dev/null # Test again: dig FAILING_DOMAIN +short
Expected: After flushing, stale cache entries are removed. If resolution works now, the issue was a cached negative response.