Step-by-step playbooks for when things go sideways. Interactive checklists with copy-paste commands.
Diagnose and fix a Kubernetes node in NotReady state. Covers kubelet health, container runtime, resource exhaustion, and node condition analysis.
Diagnose and recover an unhealthy etcd cluster. Covers health checks, disk I/O issues, compaction, defragmentation, member recovery, and backup/restore.
Diagnose DNS resolution failures on Linux systems and Kubernetes clusters. Covers resolver config, upstream DNS, systemd-resolved, CoreDNS, and DNSSEC issues.