Step-by-step playbooks for when things go sideways. Interactive checklists with copy-paste commands.
Diagnose why a Kubernetes pod won't schedule. Covers insufficient resources, node conditions, taints/tolerations, affinity rules, and PVC binding issues.
Diagnose and fix a Kubernetes node in NotReady state. Covers kubelet health, container runtime, resource exhaustion, and node condition analysis.
Diagnose and fix pods killed by the OOM (Out Of Memory) killer. Covers memory limit analysis, resource tuning, memory leak detection, and node memory pressure.
Diagnose why Kubernetes can't pull a container image. Covers image name typos, registry auth, pull secrets, network issues, and rate limits.
Diagnose and recover an unhealthy etcd cluster. Covers health checks, disk I/O issues, compaction, defragmentation, member recovery, and backup/restore.
Diagnose DNS resolution failures on Linux systems and Kubernetes clusters. Covers resolver config, upstream DNS, systemd-resolved, CoreDNS, and DNSSEC issues.
Click through symptoms to diagnose why your Kubernetes pod won't start. Covers CrashLoopBackOff, ImagePullBackOff, Pending, and Error states with targeted fix commands.
Diagnose why a Kubernetes service can't be reached. Walks through pod connectivity, service selectors, endpoints, network policies, and ingress configuration.