P1
OOMKilled Pod — Kubernetes Troubleshooting Guide
Diagnose and fix pods killed by the OOM (Out Of Memory) killer. Covers memory limit analysis, resource tuning, memory leak detection, and node memory pressure.
10 min8 steps
Progress: 0/8 steps
0%
Find all pods that were killed due to memory limits being exceeded.
kubectl get pods -A -o json | jq -r '.items[] | select(.status.containerStatuses[]?.lastState.terminated.reason=="OOMKilled" or .status.containerStatuses[]?.state.waiting.reason=="CrashLoopBackOff") | "\(.metadata.namespace)/\(.metadata.name)"'
Expected: List of namespace/pod pairs with OOMKilled status. If empty, check events instead.
Get detailed status including restart count and last termination reason.
kubectl describe pod POD_NAME -n NAMESPACE | grep -A3 'Last State\|State:\|Restart Count\|Reason:'
Expected: Reason: OOMKilled confirms the pod exceeded its memory limit. High restart count indicates recurring issue.
View the memory requests and limits set on the pod's containers.
kubectl get pod POD_NAME -n NAMESPACE -o jsonpath='{range .spec.containers[*]}{.name}{"\trequest="}{.resources.requests.memory}{"\tlimit="}{.resources.limits.memory}{"\n"}{end}'Expected: Shows memory request and limit per container. If limit is too low for the workload, it will be OOMKilled.
Compare current memory usage against the limits (requires metrics-server).
kubectl top pod POD_NAME -n NAMESPACE --containers
Expected: MEMORY column shows current usage. If it's near the limit, the container is at risk of OOMKill.
Requires metrics-server running in the cluster. Install with: kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
Verify if the node itself is under memory pressure, which can trigger system OOM.
kubectl describe node $(kubectl get pod POD_NAME -n NAMESPACE -o jsonpath='{.spec.nodeName}') | grep -A5 'Conditions:' | grep -E 'Memory|Ready'Expected: MemoryPressure=False is normal. If True, the node is running low on memory and kubelet evicts pods.
SSH to the node and check kernel OOM killer messages.
dmesg -T | grep -i 'oom\|killed process' | tail -10
Expected: Shows which process was killed and how much memory it was using. Includes total memory and free memory at kill time.
Check if it's a memory leak (gradual increase) or spike (sudden burst).
kubectl logs POD_NAME -n NAMESPACE --previous | tail -50
Expected: Logs from the previous (killed) container. Look for allocation patterns, cache growth, or error storms before the kill.
Either increase the memory limit (if the app genuinely needs more) or fix the memory leak.
# Option 1: Increase limit (patch deployment) kubectl set resources deployment DEPLOYMENT -n NAMESPACE --limits=memory=512Mi --requests=memory=256Mi
# Option 2: Set up VPA for auto-tuning # kubectl apply -f https://github.com/kubernetes/autoscaler/releases/latest/download/vertical-pod-autoscaler.yaml
Expected: Deployment rolls out new pods with updated limits. Monitor for 30 minutes to confirm stability.
Don't set limits excessively high — it affects scheduling and can starve other pods. Aim for 20-30% headroom above normal usage.