Prometheus Target Down — No Metrics Being Scraped
Fix a target stuck in `DOWN` on Prometheus's /targets page. Covers network reachability, /metrics format, relabeling mistakes, TLS/auth, and scrape timing.
Prometheus tells you exactly what broke. Don't guess before reading this.
# In the browser: http://prometheus:9090/targets # Or via API: curl -sS http://localhost:9090/api/v1/targets | python3 -m json.tool | head -80
Prometheus in Docker/Kubernetes may not resolve the same DNS/IPs you do. Always test from inside its network namespace.
# Native: curl -vs http://<target>:<port>/metrics 2>&1 | head -30 # From Prometheus container: docker exec -it prometheus wget -qO- http://<target>:<port>/metrics | head -20 # From Kubernetes: kubectl exec -n monitoring prometheus-0 -- wget -qO- http://<svc>.<ns>:<port>/metrics | head -20
Prometheus is strict about the OpenMetrics exposition format. A trailing non-UTF8 byte or a malformed label breaks the scrape.
# Use promtool to parse the output: curl -s http://<target>:<port>/metrics | promtool check metrics 2>&1 | head -20 # Or manually look for bad lines: curl -s http://<target>:<port>/metrics | grep -nE '^[^#a-zA-Z_]' | head
Most 'handshake failure' and '401' errors are auth misconfig, not network.
# Prometheus config sample for bearer auth: # - job_name: api # authorization: # type: Bearer # credentials_file: /etc/prometheus/api.token # scheme: https # tls_config: # ca_file: /etc/ssl/certs/ca-certificates.crt # Verify the token and TLS from prometheus: curl -sS -H "Authorization: Bearer $(cat /etc/prometheus/api.token)" https://<target>/metrics --cacert /etc/ssl/certs/ca-certificates.crt | head
A `keep` action with the wrong regex silently removes targets from the scrape pool before they even show as DOWN — they just vanish.
# Compare what service discovery returns vs what ends up scraped:
curl -s http://localhost:9090/api/v1/targets?state=active | python3 -c 'import json,sys; d=json.load(sys.stdin); print("active:", len(d["data"]["activeTargets"]))'
curl -s http://localhost:9090/api/v1/targets?state=dropped | python3 -c 'import json,sys; d=json.load(sys.stdin); print("dropped:", len(d["data"]["droppedTargets"]))'A target that takes >10s to answer will time out. Interval smaller than scrape duration also misbehaves.
# Measure /metrics response time: time curl -sS -o /dev/null http://<target>:<port>/metrics # Adjust in prometheus.yml: # - job_name: slow-exporter # scrape_interval: 60s # scrape_timeout: 30s
Prometheus rereads its config on SIGHUP or /-/reload if `--web.enable-lifecycle` is set.
curl -sS -X POST http://localhost:9090/-/reload && echo 'reloaded'
# Or: kill -HUP $(pidof prometheus)
# Confirm target went UP:
sleep 5 && curl -s http://localhost:9090/api/v1/targets | python3 -c 'import json,sys; [print(t["labels"].get("job"), t["health"]) for t in json.load(sys.stdin)["data"]["activeTargets"]]'