Step-by-step playbooks for when things go sideways. Interactive checklists with copy-paste commands.
Troubleshoot PostgreSQL connection failures. Covers service status, listen address configuration, pg_hba.conf authentication rules, connection limits, and restart procedures.
Diagnose and fix pods killed by the OOM (Out Of Memory) killer. Covers memory limit analysis, resource tuning, memory leak detection, and node memory pressure.
Diagnose why Kubernetes can't pull a container image. Covers image name typos, registry auth, pull secrets, network issues, and rate limits.
Fix a certbot renewal that's stuck failing. Covers HTTP-01 + DNS-01 challenges, rate limits, webroot + authenticator mismatches, and getting the cert pushed to services that cached the old chain.