Kubernetes Pod Pending: Causes Explained & Scheduling Debug

A pod stuck in Pending status means Kubernetes accepted the pod specification but the scheduler cannot find a suitable node to run it on. Unlike CrashLoopBackOff where the container starts and crashes, a Pending pod never starts at all. The issue is always at the scheduling level, and the Events section of kubectl describe pod will almost always tell you exactly why.

Why Pods Get Stuck in Pending

The Kubernetes scheduler goes through a series of checks for each node:

1. Does the node have enough CPU and memory for the pod's requests?
2. Does the pod's nodeSelector match the node's labels?
3. Does the pod tolerate the node's taints?
4. Do affinity/anti-affinity rules allow this placement?
5. Are all required volumes available and bindable?
6. Has the namespace exceeded its ResourceQuota?
7. Has the node reached its maxPods limit?

If no node passes all these checks, the pod stays in Pending indefinitely (or until the cluster autoscaler adds a new node).

Common Causes and Fixes

1. Insufficient CPU or Memory

The most frequent cause. The pod requests more resources than any node has available.

Diagnose:

# Check what the pod requests
kubectl get pod my-pod -n my-namespace -o jsonpath='{.spec.containers[*].resources.requests}' | jq .

# Check what the scheduler says
kubectl describe pod my-pod -n my-namespace | tail -10
# Events:
#   Warning  FailedScheduling  0/3 nodes are available: 3 Insufficient memory.

# Check node allocations
kubectl describe nodes | grep -A6 "Allocated resources"

Fix:

# Option 1: Reduce the pod's resource requests
kubectl patch deployment my-app -n my-namespace --type='json' \
  -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/resources/requests/memory", "value": "128Mi"}]'

# Option 2: Scale down other workloads to free resources
kubectl scale deployment low-priority-app -n my-namespace --replicas=1

# Option 3: Add nodes to the cluster
# (Cloud-specific -- GKE, EKS, AKS node pool scaling)

Important: The scheduler uses requests, not limits, for placement decisions. A pod requesting 4Gi of memory will not be scheduled on a node with only 3Gi allocatable, even if the node's actual usage is low.

# Compare requests vs allocatable across all nodes
kubectl get nodes -o custom-columns=\
'NAME:.metadata.name,CPU_ALLOC:.status.allocatable.cpu,MEM_ALLOC:.status.allocatable.memory'

# See what is already allocated on each node
kubectl describe node my-node | grep -A6 "Allocated resources"
#   Resource           Requests      Limits
#   cpu                1900m (95%)   3800m (190%)
#   memory             3584Mi (92%)  7168Mi (185%)

2. Node Selector Mismatch

The pod specifies a nodeSelector that does not match any node's labels.

Diagnose:

# Check the pod's nodeSelector
kubectl get pod my-pod -n my-namespace -o jsonpath='{.spec.nodeSelector}' | jq .
# {"disktype": "ssd", "env": "production"}

# Check which nodes have those labels
kubectl get nodes --show-labels | grep disktype
# If no output, no nodes match

Fix:

# Option 1: Add the missing label to a node
kubectl label node my-node disktype=ssd

# Option 2: Remove the nodeSelector from the pod spec
kubectl patch deployment my-app -n my-namespace --type='json' \
  -p='[{"op": "remove", "path": "/spec/template/spec/nodeSelector"}]'

3. Taints and Tolerations

Nodes with taints reject pods that do not have matching tolerations. This is commonly used for dedicated node pools (GPU nodes, high-memory nodes) or to mark nodes as temporarily unavailable.

Diagnose:

# Check node taints
kubectl describe nodes | grep -E "Name:|Taints:"
# Name:    gpu-node-1
# Taints:  nvidia.com/gpu=present:NoSchedule

# Check the pod's tolerations
kubectl get pod my-pod -n my-namespace -o jsonpath='{.spec.tolerations}' | jq .

Fix:

# Option 1: Add a toleration to the pod spec
# In your deployment YAML:

tolerations:
  - key: "nvidia.com/gpu"
    operator: "Equal"
    value: "present"
    effect: "NoSchedule"

# Option 2: Remove the taint from the node
kubectl taint nodes gpu-node-1 nvidia.com/gpu:NoSchedule-

# Option 3: If the node was cordoned (taint: node.kubernetes.io/unschedulable)
kubectl uncordon my-node

Tip: Use the kubectl Builder to build taint, label, cordon, and uncordon commands -- these are built-in actions with all flags pre-configured.

4. PersistentVolumeClaim Pending

If a pod mounts a PVC that is not bound, the pod stays in Pending.

Diagnose:

# Check PVC status
kubectl get pvc -n my-namespace
# NAME         STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
# my-data      Pending                                      gp2            5m

# Check why the PVC is pending
kubectl describe pvc my-data -n my-namespace | tail -10
# Events:
#   Warning  ProvisioningFailed  no persistent volumes available for this claim and no storage class is set

Common PVC issues:

Issue	Symptom	Fix
No matching PV	`no persistent volumes available`	Create a PV or install a CSI driver
Wrong StorageClass	`storageclass "gold" not found`	Use an existing StorageClass
Wrong access mode	`AccessMode ReadWriteMany not supported`	Use ReadWriteOnce or a different storage backend
Capacity mismatch	PV exists but is too small	Create a larger PV
Zone mismatch	PV in zone-a, pod needs zone-b	Use `volumeBindingMode: WaitForFirstConsumer`

# List available StorageClasses
kubectl get storageclass

# List existing PVs and their status
kubectl get pv

# Check if a specific PV matches the PVC
kubectl describe pv my-pv

5. ResourceQuota Exceeded

The namespace has a ResourceQuota and the new pod would exceed it.

Diagnose:

# Check quota usage
kubectl describe resourcequota -n my-namespace
# Name:              mem-cpu-quota
# Resource           Used    Hard
# --------           ----    ----
# limits.cpu         3800m   4000m
# limits.memory      7Gi     8Gi
# pods               15      20

kubectl describe pod my-pod -n my-namespace | tail -5
# Warning  FailedCreate  exceeded quota: mem-cpu-quota, requested: limits.memory=1Gi, used: limits.memory=7Gi, limited: limits.memory=8Gi

Fix:

# Option 1: Increase the quota (if you have cluster-admin access)
kubectl patch resourcequota mem-cpu-quota -n my-namespace \
  --type='json' -p='[{"op": "replace", "path": "/spec/hard/limits.memory", "value": "16Gi"}]'

# Option 2: Reduce the pod's resource requests/limits
# Option 3: Scale down other workloads in the namespace

6. maxPods Limit

Each node has a maximum number of pods it can run (default: 110 for most distributions). If all nodes have hit their limit, new pods stay Pending.

Diagnose:

# Check how many pods are on each node vs the limit
kubectl get nodes -o custom-columns='NAME:.metadata.name,PODS:.status.allocatable.pods'

# Count running pods per node
kubectl get pods --all-namespaces -o wide --no-headers | awk '{print $8}' | sort | uniq -c | sort -rn

Fix: This usually means you need more nodes. In managed Kubernetes services (EKS, GKE, AKS), the maxPods limit depends on the instance type and CNI plugin. For example, AWS EKS with the VPC CNI limits pods based on the number of ENIs the instance type supports.

The Debugging Sequence

When you see a pod stuck in Pending, follow this sequence:

# Step 1: Always start here -- the Events section tells you the reason
kubectl describe pod my-pod -n my-namespace | tail -15

# Step 2: If "Insufficient cpu/memory" -- check node allocations
kubectl describe nodes | grep -A6 "Allocated resources"

# Step 3: If "didn't match node selector" -- check labels
kubectl get nodes --show-labels

# Step 4: If "untolerated taint" -- check taints
kubectl describe nodes | grep -E "Name:|Taints:"

# Step 5: If PVC-related -- check PVC status
kubectl get pvc -n my-namespace
kubectl describe pvc my-pvc -n my-namespace

# Step 6: Check for quota issues
kubectl describe resourcequota -n my-namespace

# Step 7: Check cluster-wide events
kubectl get events -n my-namespace --sort-by=.metadata.creationTimestamp | tail -20

Use the kubectl Builder to construct any of these commands interactively. The describe and get actions support all resource types with namespace, output format, and label selector options.

Prevention

Plan Resource Requests Carefully

Over-requesting resources is the number one cause of Pending pods. If every pod requests 2 CPU cores but only uses 0.1, you are wasting 95% of your cluster capacity.

# Check actual vs requested resource usage
kubectl top pods -n my-namespace --sort-by=cpu
kubectl top pods -n my-namespace --sort-by=memory

Compare kubectl top (actual usage) against kubectl describe pod (requests). If the ratio is consistently 10:1 or worse, lower your requests.

Use Cluster Autoscaler

In cloud environments, the cluster autoscaler adds nodes when pods are pending due to insufficient resources:

# GKE example
gcloud container clusters update my-cluster \
  --enable-autoscaling \
  --min-nodes=2 \
  --max-nodes=10 \
  --node-pool=default-pool

The autoscaler only triggers for resource-related Pending. It does not help with taint, node selector, or PVC issues.

Use Pod Priority and Preemption

Critical workloads can preempt lower-priority pods when resources are scarce:

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority
value: 1000000
globalDefault: false
description: "For production workloads"

# In your deployment spec
priorityClassName: high-priority

When a high-priority pod is Pending due to resources, the scheduler can evict lower-priority pods to make room.

Use WaitForFirstConsumer for Volumes

If your cluster spans multiple availability zones, set volumeBindingMode: WaitForFirstConsumer on your StorageClass. This delays PV creation until a pod is scheduled, ensuring the volume is created in the same zone as the pod.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: gp3-wait
provisioner: ebs.csi.aws.com
volumeBindingMode: WaitForFirstConsumer
parameters:
  type: gp3

Kubernetes Troubleshooting Guide -- The complete debugging framework covering all 8 common errors
Pod Stuck Pending Runbook -- Interactive step-by-step checklist with copy-paste commands
kubectl Builder -- Build the debugging commands from this article interactively
Helm CLI Builder -- Roll back Helm releases when a deployment causes scheduling issues
YAML/JSON Converter -- Validate manifest YAML before applying to the cluster

Why Pods Get Stuck in Pending

The Kubernetes scheduler goes through a series of checks for each node:

1. Does the node have enough CPU and memory for the pod's requests?
2. Does the pod's nodeSelector match the node's labels?
3. Does the pod tolerate the node's taints?
4. Do affinity/anti-affinity rules allow this placement?
5. Are all required volumes available and bindable?
6. Has the namespace exceeded its ResourceQuota?
7. Has the node reached its maxPods limit?

If no node passes all these checks, the pod stays in Pending indefinitely (or until the cluster autoscaler adds a new node).

Common Causes and Fixes

1. Insufficient CPU or Memory

The most frequent cause. The pod requests more resources than any node has available.

Diagnose:

# Check what the pod requests
kubectl get pod my-pod -n my-namespace -o jsonpath='{.spec.containers[*].resources.requests}' | jq .

# Check what the scheduler says
kubectl describe pod my-pod -n my-namespace | tail -10
# Events:
#   Warning  FailedScheduling  0/3 nodes are available: 3 Insufficient memory.

# Check node allocations
kubectl describe nodes | grep -A6 "Allocated resources"

Fix:

# Option 1: Reduce the pod's resource requests
kubectl patch deployment my-app -n my-namespace --type='json' \
  -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/resources/requests/memory", "value": "128Mi"}]'

# Option 2: Scale down other workloads to free resources
kubectl scale deployment low-priority-app -n my-namespace --replicas=1

# Option 3: Add nodes to the cluster
# (Cloud-specific -- GKE, EKS, AKS node pool scaling)

# Compare requests vs allocatable across all nodes
kubectl get nodes -o custom-columns=\
'NAME:.metadata.name,CPU_ALLOC:.status.allocatable.cpu,MEM_ALLOC:.status.allocatable.memory'

# See what is already allocated on each node
kubectl describe node my-node | grep -A6 "Allocated resources"
#   Resource           Requests      Limits
#   cpu                1900m (95%)   3800m (190%)
#   memory             3584Mi (92%)  7168Mi (185%)

2. Node Selector Mismatch

The pod specifies a nodeSelector that does not match any node's labels.

Diagnose:

# Check the pod's nodeSelector
kubectl get pod my-pod -n my-namespace -o jsonpath='{.spec.nodeSelector}' | jq .
# {"disktype": "ssd", "env": "production"}

# Check which nodes have those labels
kubectl get nodes --show-labels | grep disktype
# If no output, no nodes match

Fix:

# Option 1: Add the missing label to a node
kubectl label node my-node disktype=ssd

# Option 2: Remove the nodeSelector from the pod spec
kubectl patch deployment my-app -n my-namespace --type='json' \
  -p='[{"op": "remove", "path": "/spec/template/spec/nodeSelector"}]'

3. Taints and Tolerations

Nodes with taints reject pods that do not have matching tolerations. This is commonly used for dedicated node pools (GPU nodes, high-memory nodes) or to mark nodes as temporarily unavailable.

Diagnose:

# Check node taints
kubectl describe nodes | grep -E "Name:|Taints:"
# Name:    gpu-node-1
# Taints:  nvidia.com/gpu=present:NoSchedule

# Check the pod's tolerations
kubectl get pod my-pod -n my-namespace -o jsonpath='{.spec.tolerations}' | jq .

Fix:

# Option 1: Add a toleration to the pod spec
# In your deployment YAML:

tolerations:
  - key: "nvidia.com/gpu"
    operator: "Equal"
    value: "present"
    effect: "NoSchedule"

# Option 2: Remove the taint from the node
kubectl taint nodes gpu-node-1 nvidia.com/gpu:NoSchedule-

# Option 3: If the node was cordoned (taint: node.kubernetes.io/unschedulable)
kubectl uncordon my-node

Tip: Use the kubectl Builder to build taint, label, cordon, and uncordon commands -- these are built-in actions with all flags pre-configured.

4. PersistentVolumeClaim Pending

If a pod mounts a PVC that is not bound, the pod stays in Pending.

Diagnose:

# Check PVC status
kubectl get pvc -n my-namespace
# NAME         STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
# my-data      Pending                                      gp2            5m

# Check why the PVC is pending
kubectl describe pvc my-data -n my-namespace | tail -10
# Events:
#   Warning  ProvisioningFailed  no persistent volumes available for this claim and no storage class is set

Common PVC issues:

Issue	Symptom	Fix
No matching PV	`no persistent volumes available`	Create a PV or install a CSI driver
Wrong StorageClass	`storageclass "gold" not found`	Use an existing StorageClass
Wrong access mode	`AccessMode ReadWriteMany not supported`	Use ReadWriteOnce or a different storage backend
Capacity mismatch	PV exists but is too small	Create a larger PV
Zone mismatch	PV in zone-a, pod needs zone-b	Use `volumeBindingMode: WaitForFirstConsumer`

# List available StorageClasses
kubectl get storageclass

# List existing PVs and their status
kubectl get pv

# Check if a specific PV matches the PVC
kubectl describe pv my-pv

5. ResourceQuota Exceeded

The namespace has a ResourceQuota and the new pod would exceed it.

Diagnose:

# Check quota usage
kubectl describe resourcequota -n my-namespace
# Name:              mem-cpu-quota
# Resource           Used    Hard
# --------           ----    ----
# limits.cpu         3800m   4000m
# limits.memory      7Gi     8Gi
# pods               15      20

kubectl describe pod my-pod -n my-namespace | tail -5
# Warning  FailedCreate  exceeded quota: mem-cpu-quota, requested: limits.memory=1Gi, used: limits.memory=7Gi, limited: limits.memory=8Gi

Fix:

# Option 1: Increase the quota (if you have cluster-admin access)
kubectl patch resourcequota mem-cpu-quota -n my-namespace \
  --type='json' -p='[{"op": "replace", "path": "/spec/hard/limits.memory", "value": "16Gi"}]'

# Option 2: Reduce the pod's resource requests/limits
# Option 3: Scale down other workloads in the namespace

6. maxPods Limit

Each node has a maximum number of pods it can run (default: 110 for most distributions). If all nodes have hit their limit, new pods stay Pending.

Diagnose:

# Check how many pods are on each node vs the limit
kubectl get nodes -o custom-columns='NAME:.metadata.name,PODS:.status.allocatable.pods'

# Count running pods per node
kubectl get pods --all-namespaces -o wide --no-headers | awk '{print $8}' | sort | uniq -c | sort -rn

The Debugging Sequence

When you see a pod stuck in Pending, follow this sequence:

# Step 1: Always start here -- the Events section tells you the reason
kubectl describe pod my-pod -n my-namespace | tail -15

# Step 2: If "Insufficient cpu/memory" -- check node allocations
kubectl describe nodes | grep -A6 "Allocated resources"

# Step 3: If "didn't match node selector" -- check labels
kubectl get nodes --show-labels

# Step 4: If "untolerated taint" -- check taints
kubectl describe nodes | grep -E "Name:|Taints:"

# Step 5: If PVC-related -- check PVC status
kubectl get pvc -n my-namespace
kubectl describe pvc my-pvc -n my-namespace

# Step 6: Check for quota issues
kubectl describe resourcequota -n my-namespace

# Step 7: Check cluster-wide events
kubectl get events -n my-namespace --sort-by=.metadata.creationTimestamp | tail -20

Use the kubectl Builder to construct any of these commands interactively. The describe and get actions support all resource types with namespace, output format, and label selector options.

Prevention

Plan Resource Requests Carefully

Over-requesting resources is the number one cause of Pending pods. If every pod requests 2 CPU cores but only uses 0.1, you are wasting 95% of your cluster capacity.

# Check actual vs requested resource usage
kubectl top pods -n my-namespace --sort-by=cpu
kubectl top pods -n my-namespace --sort-by=memory

Compare kubectl top (actual usage) against kubectl describe pod (requests). If the ratio is consistently 10:1 or worse, lower your requests.

Use Cluster Autoscaler

In cloud environments, the cluster autoscaler adds nodes when pods are pending due to insufficient resources:

# GKE example
gcloud container clusters update my-cluster \
  --enable-autoscaling \
  --min-nodes=2 \
  --max-nodes=10 \
  --node-pool=default-pool

The autoscaler only triggers for resource-related Pending. It does not help with taint, node selector, or PVC issues.

Use Pod Priority and Preemption

Critical workloads can preempt lower-priority pods when resources are scarce:

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority
value: 1000000
globalDefault: false
description: "For production workloads"

# In your deployment spec
priorityClassName: high-priority

When a high-priority pod is Pending due to resources, the scheduler can evict lower-priority pods to make room.

Use WaitForFirstConsumer for Volumes

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: gp3-wait
provisioner: ebs.csi.aws.com
volumeBindingMode: WaitForFirstConsumer
parameters:
  type: gp3

Kubernetes Troubleshooting Guide -- The complete debugging framework covering all 8 common errors
Pod Stuck Pending Runbook -- Interactive step-by-step checklist with copy-paste commands
kubectl Builder -- Build the debugging commands from this article interactively
Helm CLI Builder -- Roll back Helm releases when a deployment causes scheduling issues
YAML/JSON Converter -- Validate manifest YAML before applying to the cluster

Kubernetes Pod Stuck in Pending: Causes and Fixes

Why Pods Get Stuck in Pending

Common Causes and Fixes

1. Insufficient CPU or Memory

2. Node Selector Mismatch

3. Taints and Tolerations

4. PersistentVolumeClaim Pending

5. ResourceQuota Exceeded

6. maxPods Limit

The Debugging Sequence

Prevention

Plan Resource Requests Carefully

Use Cluster Autoscaler

Use Pod Priority and Preemption

Use WaitForFirstConsumer for Volumes

Related Resources

Why Pods Get Stuck in Pending

Common Causes and Fixes

1. Insufficient CPU or Memory

2. Node Selector Mismatch

3. Taints and Tolerations

4. PersistentVolumeClaim Pending

5. ResourceQuota Exceeded

6. maxPods Limit

The Debugging Sequence

Prevention

Plan Resource Requests Carefully

Use Cluster Autoscaler

Use Pod Priority and Preemption

Use WaitForFirstConsumer for Volumes