The CrashLoopBackOff status is one of the most common and most frustrating Kubernetes errors. It means your container starts, crashes, Kubernetes restarts it, it crashes again, and the restart delay grows exponentially. Understanding the backoff cycle and the most common root causes will help you resolve it in minutes instead of hours.
When a container exits with a non-zero exit code, Kubernetes restarts it according to the pod's restartPolicy (which defaults to Always for Deployments). Each restart adds an increasing delay:
Restart 1: wait 10 seconds
Restart 2: wait 20 seconds
Restart 3: wait 40 seconds
Restart 4: wait 80 seconds
Restart 5: wait 160 seconds
Restart 6+: wait 300 seconds (5 minute cap)
The "BackOff" part is this exponential delay. The pod status shows CrashLoopBackOff when Kubernetes is waiting between restarts. During the brief moments when the container is actually running (and then crashing), the status shows Error or Running.
# You will see something like this
kubectl get pods -n my-namespace
# NAME READY STATUS RESTARTS AGE
# my-app-abc 0/1 CrashLoopBackOff 7 (3m ago) 15m
The restart count and time-since-last-restart give you clues. High restart count (50+) with a recent restart means it has been crashing for a while. The backoff timer resets after the container runs successfully for 10 minutes.
The most common cause. The container used more memory than its resources.limits.memory allows, and the kernel killed it.
Diagnose:
# Check if the last termination was OOMKilled
kubectl describe pod my-pod -n my-namespace | grep -A3 "Last State"
# Last State: Terminated
# Reason: OOMKilled
# Exit Code: 137
# Check current memory limits
kubectl get pod my-pod -n my-namespace -o jsonpath='{.spec.containers[0].resources}' | jq .
Fix:
# Increase the memory limit in your deployment spec
resources:
requests:
memory: "256Mi" # What the scheduler uses for placement
limits:
memory: "512Mi" # Hard ceiling -- OOMKilled if exceeded
# Apply the change
kubectl apply -f deployment.yaml
# Or patch directly
kubectl set resources deployment/my-app -n my-namespace --limits=memory=512Mi
How to find the right limit: Run the application under load and observe actual usage with kubectl top pod. Set the limit to 1.5-2x the peak observed usage.
The container depends on a ConfigMap or Secret that does not exist, is in the wrong namespace, or is missing a required key.
Diagnose:
kubectl describe pod my-pod -n my-namespace | grep -A10 "Events"
# Warning Failed ... Error: configmap "my-config" not found
# Warning Failed ... Error: secret "my-secret" not found
Fix:
# Check if the configmap/secret exists
kubectl get configmap my-config -n my-namespace
kubectl get secret my-secret -n my-namespace
# If missing, create it
kubectl create configmap my-config --from-file=config.yaml -n my-namespace
kubectl create secret generic my-secret --from-literal=password=mypass -n my-namespace
Tip: To inspect the actual values in a secret, use the Base64 Encoder/Decoder with batch mode. Paste the
kubectl get secret -o yamloutput and it decodes every field.
The image exists and pulls successfully, but the entrypoint or command is wrong.
Diagnose:
# Check the exit code
kubectl describe pod my-pod -n my-namespace | grep "Exit Code"
# Exit Code 127 = command not found
# Exit Code 126 = permission denied on executable
# Exit Code 1 = application error
# Exit Code 2 = shell builtin misuse
# Check the logs
kubectl logs my-pod -n my-namespace --previous
Common issues:
command or args in the pod spec overriding the image's entrypointglibc vs musl)Fix:
# Test the image locally first
docker run --rm -it my-image:tag /bin/sh
# Verify the entrypoint
docker inspect my-image:tag --format='{{.Config.Entrypoint}} {{.Config.Cmd}}'
Use the Docker CLI Builder to construct docker run and docker inspect commands with the right flags.
The liveness probe is configured to check the application's health, but it fails -- so Kubernetes kills and restarts the container.
Diagnose:
kubectl describe pod my-pod -n my-namespace | grep -A5 "Liveness"
# Liveness: http-get http://:8080/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
kubectl describe pod my-pod -n my-namespace | grep "Unhealthy"
# Warning Unhealthy Liveness probe failed: HTTP probe failed with statuscode: 503
Common fixes:
initialDelaySeconds: application needs more time to starttimeoutSeconds: probe times out before the app respondslivenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 30 # Give the app time to start
periodSeconds: 10
timeoutSeconds: 5 # Allow slow responses
failureThreshold: 3 # Fail 3 times before restarting
The container process cannot access files, sockets, or ports due to SecurityContext restrictions or file permission mismatches.
Diagnose:
kubectl logs my-pod -n my-namespace --previous
# Look for: "Permission denied", "EACCES", "Operation not permitted"
# Check the security context
kubectl get pod my-pod -n my-namespace -o jsonpath='{.spec.containers[0].securityContext}' | jq .
kubectl get pod my-pod -n my-namespace -o jsonpath='{.spec.securityContext}' | jq .
Common causes:
runAsNonRoot: true but the image's entrypoint requires rootreadOnlyRootFilesystem: true but the app writes to the filesystemfsGroup -- files not readable by the container userNET_BIND_SERVICE capability)Fix:
securityContext:
runAsUser: 1000
runAsGroup: 1000
fsGroup: 1000 # Set group ownership on mounted volumes
The container starts but immediately exits because it cannot reach a required service -- database, message queue, external API.
Diagnose:
kubectl logs my-pod -n my-namespace --previous
# Look for: "connection refused", "ECONNREFUSED", "no such host", "timeout"
Fix: Use init containers to wait for dependencies before starting the main container:
initContainers:
- name: wait-for-db
image: busybox:1.36
command: ['sh', '-c', 'until nc -z postgres-service 5432; do echo waiting for db; sleep 2; done']
Or use readiness probes on the dependency service so Kubernetes DNS only resolves healthy backends.
When you encounter CrashLoopBackOff, follow this sequence:
# Step 1: Get the pod status and restart count
kubectl get pods -n my-namespace | grep my-app
# Step 2: Check events for scheduling or config errors
kubectl describe pod my-pod -n my-namespace | tail -20
# Step 3: Check the previous crash logs
kubectl logs my-pod -n my-namespace --previous
# Step 4: If no --previous logs, the container is dying too fast
# Check the exit code:
kubectl get pod my-pod -n my-namespace -o jsonpath='{.status.containerStatuses[0].lastState.terminated.exitCode}'
# Step 5: Check cluster events for context
kubectl get events -n my-namespace --sort-by=.metadata.creationTimestamp | tail -20
# Step 6: If you can catch it while running, exec in
kubectl exec -it my-pod -n my-namespace -- /bin/sh
Tip: Build these commands quickly using the kubectl Builder -- select the action (logs, describe, get, exec), set the namespace and resource name, and copy the result.
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "500m" # Optional -- some teams skip CPU limits
memory: "256Mi" # Always set memory limits
initialDelaySeconds.startupProbe:
httpGet:
path: /healthz
port: 8080
failureThreshold: 30
periodSeconds: 10 # Up to 300s to start
livenessProbe:
httpGet:
path: /healthz
port: 8080
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
periodSeconds: 5
Do not let your main container crash because a database is not ready. Use init containers to wait for dependencies, keeping the main container's restart count clean.
When a new release causes CrashLoopBackOff, roll back immediately:
helm rollback my-release 0 -n my-namespace
Use the Helm CLI Builder to construct rollback commands with the right flags.
| Cause | Exit Code | Key Symptom | Fix |
|---|---|---|---|
| OOMKilled | 137 | Reason: OOMKilled in describe | Increase memory limit |
| Missing ConfigMap/Secret | -- | CreateContainerConfigError | Create the missing resource |
| Bad image/entrypoint | 127 or 126 | command not found in logs | Fix command/args or rebuild image |
| Failed liveness probe | 137 | Unhealthy events | Increase delay/timeout or fix health endpoint |
| Permission denied | 1 or 126 | EACCES in logs | Fix securityContext or file permissions |
| Missing dependency | 1 | connection refused in logs | Add init containers or retry logic |