P1
Node NotReady — Kubernetes Troubleshooting Guide
Diagnose and fix a Kubernetes node in NotReady state. Covers kubelet health, container runtime, resource exhaustion, and node condition analysis.
15 min8 steps
Progress: 0/8 steps
0%
Check which nodes are NotReady and for how long.
kubectl get nodes -o wide
Expected: Nodes with STATUS 'NotReady'. AGE and VERSION columns help identify if it's a new or existing node.
Get detailed status conditions for the node.
kubectl describe node NODE_NAME | grep -A5 'Conditions:'
Expected: Look for MemoryPressure, DiskPressure, PIDPressure, or NetworkUnavailable set to True. Ready=False is the symptom, other conditions are the cause.
Look at recent events on the node.
kubectl describe node NODE_NAME | grep -A30 'Events:'
Expected: Events like 'NodeNotReady', 'ContainerGCFailed', or OOM events indicate the root cause.
Connect to the node directly to diagnose from inside.
ssh NODE_IP
Expected: Shell access to the node for local diagnostics.
The kubelet is the primary node agent — if it's down, the node goes NotReady.
systemctl status kubelet && journalctl -u kubelet --since '10 minutes ago' --no-pager | tail -30
Expected: Kubelet should be 'active (running)'. Logs show why it might have failed.
Verify containerd/Docker is running on the node.
systemctl status containerd 2>/dev/null || systemctl status docker 2>/dev/null && crictl ps | head -10
Expected: Container runtime should be active. 'crictl ps' lists running containers on the node.
Resource exhaustion is the most common cause of NotReady.
df -h / /var/lib/kubelet /var/lib/containerd && echo '---' && free -h && echo '---' && uptime
Expected: Disk >85%, no available memory, or very high load average can all cause NotReady.
If the node resources are fine, restart kubelet.
systemctl restart kubelet && sleep 10 && kubectl get nodes | grep NODE_NAME
Expected: Node should return to 'Ready' status within 30-60 seconds.
Restarting kubelet will briefly disrupt pods on this node. Kube-proxy and CNI may also restart.