Kubernetes networking is deceptively simple in theory: every pod gets its own IP address, every pod can reach every other pod, and Services provide stable endpoints for discovery. In practice, networking issues are some of the hardest problems to debug because failures are often silent — traffic just disappears.
This guide covers systematic debugging of the three layers where things go wrong: Services, DNS, and Ingress.
Before troubleshooting, you need to understand what Kubernetes guarantees:
The networking model is implemented by a CNI plugin (Calico, Cilium, Flannel, Weave, etc.). When networking breaks, the issue is usually in the Service layer, DNS resolution, or Ingress routing — not in the CNI plugin itself.
A Service is a stable abstraction over a set of pods. When a Service is not routing traffic, the issue is almost always a selector mismatch or missing endpoints.
kubectl get svc my-service -n my-namespace
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
my-service ClusterIP 10.96.45.123 <none> 8080/TCP 2d
Endpoints connect a Service to its backing pods. If the endpoint list is empty, the Service has no targets:
kubectl get endpoints my-service -n my-namespace
NAME ENDPOINTS AGE
my-service 10.244.1.5:8080,10.244.2.9:8080 2d
Empty endpoints means one of:
targetPort does not match the container's port# Check the Service selector
kubectl describe svc my-service -n my-namespace | grep Selector
# Check pod labels
kubectl get pods -n my-namespace --show-labels
Compare the two. A common mistake is a typo in the selector — app: my-app in the Service vs app: myapp (no hyphen) on the pod.
Tip: Use the kubectl Builder to construct
describeandgetcommands with label selectors.
Do not test from outside. Start a debug pod inside the cluster:
# Start a temporary debug pod
kubectl run debug --rm -it --image=nicolaka/netshoot -- bash
# From inside the debug pod:
curl http://my-service.my-namespace.svc.cluster.local:8080
If this works but external access does not, the issue is in your Ingress or NodePort configuration, not the Service itself.
| Type | How It Works | Common Failures |
|---|---|---|
| ClusterIP | Internal-only virtual IP | Empty endpoints (selector mismatch), wrong targetPort |
| NodePort | ClusterIP + port on every node (30000-32767) | Firewall blocking the node port, node IP not routable |
| LoadBalancer | NodePort + cloud provider LB | External IP stuck in <pending> (cloud controller issue, quota exceeded) |
| ExternalName | DNS CNAME alias | No endpoints to check, issues are DNS-level |
For LoadBalancer services stuck in <pending>:
# Check events for provisioning errors
kubectl describe svc my-service -n my-namespace
# Check cloud controller manager logs
kubectl logs -n kube-system -l app=cloud-controller-manager
Kubernetes DNS (CoreDNS) resolves service names to ClusterIPs. When DNS fails, pods cannot find each other by name — even if the underlying network is fine.
Every pod gets DNS configuration that points to CoreDNS:
# Check a pod's DNS config
kubectl exec my-pod -- cat /etc/resolv.conf
nameserver 10.96.0.10
search my-namespace.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
nameserver points to the CoreDNS ClusterIPsearch domains allow short names (my-service resolves before trying external DNS)ndots:5 means any name with fewer than 5 dots is treated as a relative name and goes through the search list first# Quick test
kubectl exec -it my-pod -- nslookup my-service
# More detailed test
kubectl exec -it my-pod -- nslookup my-service.my-namespace.svc.cluster.local
# Or use dig for full DNS information
kubectl run debug --rm -it --image=nicolaka/netshoot -- dig my-service.my-namespace.svc.cluster.local
# Check CoreDNS pods
kubectl get pods -n kube-system -l k8s-app=kube-dns
# Check CoreDNS logs for errors
kubectl logs -n kube-system -l k8s-app=kube-dns --tail=50
# Check the CoreDNS service
kubectl get svc -n kube-system kube-dns
If CoreDNS pods are in CrashLoopBackOff, check:
kubectl get configmap coredns -n kube-system -o yamlThe default ndots:5 setting causes performance issues. When a pod tries to resolve api.example.com (2 dots, fewer than 5), Kubernetes first tries:
api.example.com.my-namespace.svc.cluster.local (fail)api.example.com.svc.cluster.local (fail)api.example.com.cluster.local (fail)api.example.com (success)That is 4 DNS queries instead of 1. For applications that make many external DNS lookups, this adds significant latency.
Fix: Lower ndots in the pod spec or use fully qualified names (trailing dot):
# Option 1: Lower ndots in pod spec
spec:
dnsConfig:
options:
- name: ndots
value: "2"
# Option 2: Use FQDN in application config
# api.example.com. (note the trailing dot — bypasses search list)
| Symptom | Likely Cause | Fix |
|---|---|---|
nslookup: can't resolve | CoreDNS is down or unreachable | Check CoreDNS pods and service |
| Intermittent DNS timeouts | CoreDNS under-resourced | Increase CPU/memory limits, add replicas |
| External domains fail | Upstream DNS misconfigured | Check CoreDNS ConfigMap forward directive |
| Cross-namespace resolution fails | Wrong DNS name format | Use service.namespace.svc.cluster.local |
| Slow DNS for external names | ndots:5 causing search domain lookups | Lower ndots or use FQDNs |
Ingress is the layer that routes external HTTP/HTTPS traffic into the cluster. The request flow is:
Client → Ingress Controller (pod) → Service → Pod
When Ingress is not working, the issue can be at any link in this chain.
kubectl get ingress -n my-namespace
kubectl describe ingress my-ingress -n my-namespace
Look for:
Since Kubernetes 1.18, Ingress resources need an ingressClassName:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: my-ingress
spec:
ingressClassName: nginx # Must match an installed IngressClass
rules:
- host: app.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: my-service
port:
number: 8080
Check available ingress classes:
kubectl get ingressclass
If no ingress class is set and no default class exists, the ingress controller ignores the resource entirely.
# Find the ingress controller pods
kubectl get pods -n ingress-nginx
# Check logs for errors
kubectl logs -n ingress-nginx -l app.kubernetes.io/name=ingress-nginx --tail=100
Common log errors:
service "my-service" not found: Backend service does not exist in the specified namespaceerror obtaining certificate: TLS secret missing or malformedno upstream host: Service has no ready endpointsIf HTTPS is not working, verify the TLS secret:
# Check the secret exists
kubectl get secret my-tls-secret -n my-namespace
# Inspect the secret
kubectl describe secret my-tls-secret -n my-namespace
# Verify the certificate in the secret
kubectl get secret my-tls-secret -n my-namespace -o jsonpath='{.data.tls\.crt}' | \
base64 -d | openssl x509 -noout -subject -dates
Tip: Use the SSL Certificate Decoder to inspect certificates extracted from Kubernetes secrets. The Base64 Encoder/Decoder batch mode can decode all fields of a K8s secret at once.
| Symptom | Likely Cause | Fix |
|---|---|---|
| 404 on all paths | Wrong ingress class, controller not watching | Set ingressClassName, check controller deployment |
| 503 Bad Gateway | Backend service has no ready endpoints | Check pod readiness, selector match |
| SSL handshake failure | TLS secret missing or wrong format | Create secret with kubectl create secret tls |
| Redirect loop | Ingress and app both do HTTPS redirect | Disable one; use nginx.ingress.kubernetes.io/ssl-redirect: "false" |
| Wrong backend reached | Overlapping path rules | Check path specificity, use pathType: Exact where appropriate |
Tip: For Ingress-like reverse proxy configurations outside Kubernetes, the Nginx Config Generator can help you build equivalent configs.
Kubernetes Network Policies act as firewall rules for pod-to-pod traffic. When traffic is silently dropped and nothing else is wrong, network policies are usually the cause.
# List all network policies in a namespace
kubectl get networkpolicy -n my-namespace
# Describe a specific policy
kubectl describe networkpolicy my-policy -n my-namespace
# Check if any policies select a pod
kubectl get networkpolicy -n my-namespace -o json | \
jq '.items[] | select(.spec.podSelector.matchLabels | to_entries[] | .key == "app" and .value == "my-app")'
If you suspect a network policy is blocking traffic:
# Temporarily delete the policy to confirm
kubectl delete networkpolicy my-policy -n my-namespace
# Test if traffic flows now
kubectl exec -it debug-pod -- curl http://my-service:8080
# Re-apply the policy
kubectl apply -f network-policy.yaml
Do not leave policies deleted in production. If removing the policy fixes the issue, the policy rules need adjustment — not removal.
namespaceSelector: {} selects ALL namespaces, not the current namespace. Use namespaceSelector: { matchLabels: { name: my-namespace } } to target a specific oneTip: Use the Firewall Rule Generator to understand rule logic and conflict detection patterns. The same mental model applies to Kubernetes network policies.
When you need to test a service without going through Ingress or NodePort, port-forward creates a direct tunnel from your local machine to a pod or service:
# Forward to a specific pod
kubectl port-forward pod/my-pod 8080:8080 -n my-namespace
# Forward to a service (load-balanced across endpoints)
kubectl port-forward svc/my-service 8080:8080 -n my-namespace
# Forward to a deployment (picks one pod)
kubectl port-forward deploy/my-deployment 8080:8080 -n my-namespace
Then access http://localhost:8080 from your browser or curl. This bypasses all network layers (Ingress, Service, network policies) and connects directly to the pod's container port.
port-forward is a debugging tool, not a production access method. It creates a single TCP connection through the API server.
Tip: Use the kubectl Builder to construct port-forward commands with the right syntax.
| Error | Layer | Cause | Fix |
|---|---|---|---|
connection refused | Pod | Application not listening on the expected port | Check containerPort in pod spec, verify app config |
no route to host | Network | Node-level networking issue, CNI failure | Check node network, CNI pods, iptables rules |
connection timed out | Multiple | Network policy blocking, firewall, wrong IP | Check network policies, security groups, CIDR ranges |
502 Bad Gateway | Ingress | Backend pod crashed or not ready | Check pod status, readiness probes |
503 Service Unavailable | Ingress | No ready endpoints for the backend service | Check endpoints, pod readiness, selector match |
could not resolve host | DNS | CoreDNS down, wrong service name | Check CoreDNS pods, use FQDN |
i/o timeout on DNS | DNS | CoreDNS overloaded or unreachable | Scale CoreDNS, check kube-dns service |
External IP <pending> | Service | Cloud LB not provisioned | Check cloud controller, quotas, service events |
When networking is broken, work from the inside out:
NAMESPACE="my-namespace"
SERVICE="my-service"
PORT="8080"
# 1. Is the pod running and ready?
kubectl get pods -n $NAMESPACE -l app=$SERVICE
# 2. Can the pod reach itself?
kubectl exec -n $NAMESPACE deploy/$SERVICE -- curl -s localhost:$PORT
# 3. Does the service have endpoints?
kubectl get endpoints $SERVICE -n $NAMESPACE
# 4. Can another pod reach the service by ClusterIP?
kubectl run debug --rm -it --image=nicolaka/netshoot -- \
curl -s http://$SERVICE.$NAMESPACE.svc.cluster.local:$PORT
# 5. Does DNS resolve?
kubectl run debug --rm -it --image=nicolaka/netshoot -- \
nslookup $SERVICE.$NAMESPACE.svc.cluster.local
# 6. Are network policies blocking traffic?
kubectl get networkpolicy -n $NAMESPACE
# 7. Is the ingress routing correctly?
kubectl describe ingress -n $NAMESPACE
Work through these steps in order. Each step narrows down the layer where the failure occurs.
Kubernetes networking issues fall into three layers: Services (selector mismatches, missing endpoints), DNS (CoreDNS failures, ndots overhead), and Ingress (wrong class, missing TLS secrets, backend errors). The debugging process is always the same: start inside the cluster, verify each layer, and work outward.
For hands-on debugging:
Here are the two completed articles. A summary of each:
Article 1: "Helm Charts Explained: Install, Upgrade, and Rollback with Confidence" (~2200 words)
helm install (repo, local, custom values, namespaces, version pinning), helm upgrade (atomic, reuse-values gotcha, --set vs -f), helm rollback (history, revision rollback, rollback vs fix-forward decision), helm template (debugging, dry-run, linting), 5 common mistakes (no --atomic, --set abuse, no version pin, ignoring helm-diff, not checking status), and a quick reference tableArticle 2: "Kubernetes Networking Troubleshooting: Services, DNS, and Ingress" (~2000 words)
Both articles follow the established conventions from the existing SSL/TLS content hub: practical tone, real commands, tables for quick reference, inline tips with tool cross-links, and code blocks using the escaped backtick format required for template literals in the TypeScript data file.