Skip to main content

rawops.dev

P2

Pod Stuck Pending — Interactive Troubleshooting Checklist

Diagnose why a Kubernetes pod won't schedule. Covers insufficient resources, node conditions, taints/tolerations, affinity rules, and PVC binding issues.

10 min7 steps
Progress: 0/7 steps
0%

Get the pod's current status and any scheduling events.

kubectl describe pod POD_NAME -n NAMESPACE | tail -20
Expected: Events section at the bottom shows why scheduling failed. Common: 'Insufficient cpu/memory', 'MatchNodeSelector', 'PodToleratesNodeTaints'.

Verify the pod's CPU/memory requests are reasonable.

kubectl get pod POD_NAME -n NAMESPACE -o jsonpath='{.spec.containers[*].resources}' | jq .
Expected: Shows requests and limits for each container. Requests are what the scheduler uses for placement.

Verify nodes have enough resources to schedule the pod.

kubectl describe nodes | grep -A6 'Allocated resources'
Expected: Shows allocated vs capacity for each node. If all nodes are near capacity, the pod can't schedule.

Verify nodes are healthy and schedulable.

kubectl get nodes -o wide && kubectl describe nodes | grep -E 'Taints:|Conditions:' -A5
Expected: All nodes should be 'Ready'. Check for taints that might prevent scheduling.

Verify the pod's node selector and affinity rules match available nodes.

kubectl get pod POD_NAME -n NAMESPACE -o jsonpath='{.spec.nodeSelector}' && echo && kubectl get pod POD_NAME -n NAMESPACE -o jsonpath='{.spec.affinity}' | jq . 2>/dev/null
Expected: If nodeSelector is set, nodes must have matching labels. Check with: kubectl get nodes --show-labels

If the pod mounts a PersistentVolumeClaim, check if it's bound.

kubectl get pvc -n NAMESPACE && kubectl describe pvc PVC_NAME -n NAMESPACE | tail -10
Expected: PVC status should be 'Bound'. 'Pending' means no PV matches the claim.

Based on the diagnosis, apply the appropriate fix.

# Scale down other workloads:
kubectl scale deployment DEPLOYMENT --replicas=N -n NAMESPACE

# Remove node taint:
kubectl taint nodes NODE_NAME key:NoSchedule-

# Add nodes to the cluster (cluster-specific)
Expected: Pod should transition from Pending to Running within 30-60 seconds after fix is applied.