Skip to main content

rawops.dev

P1

Node NotReady — Kubernetes Troubleshooting Guide

Diagnose and fix a Kubernetes node in NotReady state. Covers kubelet health, container runtime, resource exhaustion, and node condition analysis.

15 min8 steps
Progress: 0/8 steps
0%

Check which nodes are NotReady and for how long.

kubectl get nodes -o wide
Expected: Nodes with STATUS 'NotReady'. AGE and VERSION columns help identify if it's a new or existing node.

Get detailed status conditions for the node.

kubectl describe node NODE_NAME | grep -A5 'Conditions:'
Expected: Look for MemoryPressure, DiskPressure, PIDPressure, or NetworkUnavailable set to True. Ready=False is the symptom, other conditions are the cause.

Look at recent events on the node.

kubectl describe node NODE_NAME | grep -A30 'Events:'
Expected: Events like 'NodeNotReady', 'ContainerGCFailed', or OOM events indicate the root cause.

Connect to the node directly to diagnose from inside.

ssh NODE_IP
Expected: Shell access to the node for local diagnostics.

The kubelet is the primary node agent — if it's down, the node goes NotReady.

systemctl status kubelet && journalctl -u kubelet --since '10 minutes ago' --no-pager | tail -30
Expected: Kubelet should be 'active (running)'. Logs show why it might have failed.

Verify containerd/Docker is running on the node.

systemctl status containerd 2>/dev/null || systemctl status docker 2>/dev/null && crictl ps | head -10
Expected: Container runtime should be active. 'crictl ps' lists running containers on the node.

Resource exhaustion is the most common cause of NotReady.

df -h / /var/lib/kubelet /var/lib/containerd && echo '---' && free -h && echo '---' && uptime
Expected: Disk >85%, no available memory, or very high load average can all cause NotReady.

If the node resources are fine, restart kubelet.

systemctl restart kubelet && sleep 10 && kubectl get nodes | grep NODE_NAME
Expected: Node should return to 'Ready' status within 30-60 seconds.
Restarting kubelet will briefly disrupt pods on this node. Kube-proxy and CNI may also restart.