Skip to main content

rawops.dev

P1

Let's Encrypt Renewal Failed — certbot Troubleshooting

Fix a certbot renewal that's stuck failing. Covers HTTP-01 + DNS-01 challenges, rate limits, webroot + authenticator mismatches, and getting the cert pushed to services that cached the old chain.

20 min7 steps
Progress: 0/7 steps
0%

Confirm the cert is actually at risk and see what certbot last tried.

certbot certificates 2>/dev/null | grep -E 'Certificate Name|Expiry Date|Domains'

# Last renewal exit status:
grep -E 'renew|error|Congratulations' /var/log/letsencrypt/letsencrypt.log | tail -30
Expected: `VALID: X days` column shows remaining days. Renewals fire at ~30 days remaining. Log should show either 'renewed successfully' or a concrete error.

Reproduce the failure against Let's Encrypt's staging environment without consuming production rate limit.

certbot renew --dry-run --non-interactive 2>&1 | tail -40
Expected: Staging output shows the exact authenticator step that fails. Real errors: 'Invalid response from http://...', 'DNS problem: SERVFAIL', 'urn:ietf:params:acme:error:rateLimited'.

Let's Encrypt needs to GET http://<domain>/.well-known/acme-challenge/<token> from the public internet.

# Webroot authenticator must match the server's docroot:
grep -r authenticator /etc/letsencrypt/renewal/*.conf | head

# Write a test file and fetch it from outside:
mkdir -p /var/www/html/.well-known/acme-challenge
echo probe > /var/www/html/.well-known/acme-challenge/probe
curl -sS http://<your-domain>/.well-known/acme-challenge/probe
rm /var/www/html/.well-known/acme-challenge/probe
Expected: curl should return `probe`. If it returns 404 or redirects to HTTPS, the webroot doesn't match or a redirect rule is stripping the challenge path.
A blanket `:80 → :443` redirect in nginx/apache must explicitly exempt `/.well-known/acme-challenge/*`.

DNS-01 is required for wildcards and works when port 80 is firewalled. Propagation lag is the usual culprit.

# Manual probe (replace provider-specific helper as needed):
certbot certonly --manual --preferred-challenges dns --dry-run -d 'example.com,*.example.com' 2>&1 | tail -20

# Verify TXT from the public edge, not the authoritative:
dig +short TXT _acme-challenge.example.com @1.1.1.1
Expected: The TXT record certbot wants to set must resolve globally before the ACME server queries it. Low-TTL zones (60s) recover fastest.

Let's Encrypt caps 5 duplicate certs / 7 days and 50 certs per registered domain / week. Production rate limits don't apply to staging.

# Search the crt.sh log of issuances:
curl -sS 'https://crt.sh/?q=example.com&output=json' | python3 -c 'import json,sys,datetime; xs=json.load(sys.stdin); now=datetime.datetime.utcnow(); r=[x for x in xs if (now-datetime.datetime.fromisoformat(x["not_before"].replace("Z",""))).days<7]; print(len(r),"issued in last 7 days")'
Expected: If you're at or above 5 duplicates / 7 days, use the staging env (`--test-cert`) until the window clears. There is no way to appeal the limit.

Apply the correction and re-run. `--force-renewal` bypasses the 30-day threshold.

# After the fix:
certbot renew --force-renewal --non-interactive 2>&1 | tail -20

# If the renewal config is wrong, delete and re-issue cleanly:
# certbot delete --cert-name example.com
# certbot certonly --webroot -w /var/www/html -d example.com
Expected: Exit code 0 and 'Congratulations!'. New cert lands in /etc/letsencrypt/live/<domain>/.
`--force-renewal` consumes your rate-limit budget — make sure the fix is real before using it.

nginx, haproxy, postfix, and similar services keep the cert in memory. Renewal on disk doesn't restart them.

nginx -t && systemctl reload nginx
# Or for haproxy: systemctl reload haproxy
# Or deploy-hook for future renewals:
# /etc/letsencrypt/renewal-hooks/deploy/reload-nginx.sh
#   #!/bin/sh
#   systemctl reload nginx
Expected: `openssl s_client -connect <domain>:443 -servername <domain> </dev/null 2>/dev/null | openssl x509 -noout -enddate` shows the new `notAfter`.