Step-by-step playbooks for when things go sideways. Interactive checklists with copy-paste commands.
Fix a target stuck in `DOWN` on Prometheus's /targets page. Covers network reachability, /metrics format, relabeling mistakes, TLS/auth, and scrape timing.
Click through symptoms to diagnose why your Kubernetes pod won't start. Covers CrashLoopBackOff, ImagePullBackOff, Pending, and Error states with targeted fix commands.
Diagnose high CPU or memory usage on Linux. Branches into process identification, OOM killer analysis, memory leaks, CPU steal, and swap management with targeted fix commands.