Skip to main content

rawops.dev

Skip to tool content

SLO / Error Budget Calculator

Calculate error budgets, allowed downtime, and burn rate alerts from SLO targets

SLO Target

Optional Inputs

Request rate enables request-based error budget. MTTR enables incident count estimation.

0.100%
Error Budget
43m 12s
Allowed Downtime
43.2K
Allowed Errors
1
Max Incidents (MTTR)

Detailed Breakdown

SLO Target
99.9%
Error Budget
0.1000%
Time Window
Month (30 days)
Total Minutes
43,200
Downtime Budget
43.20 minutes
Downtime Formatted
43m 12s
Total Requests
43.20M
Allowed Errors
43.2K
Errors / Minute
1.00
Max Incidents (30m MTTR)
1

Multi-Burn-Rate Alerts (30-day window)

Based on the Google SRE Workbook pattern. Each alert triggers when the error rate exceeds the burn rate threshold.

Critical (fast burn)
Lookback: 1h | Budget consumed: 2% | Burn rate: 14.4x | Max downtime: 52s
Warning (medium burn)
Lookback: 6h | Budget consumed: 5% | Burn rate: 6x | Max downtime: 2m 10s
Ticket (slow burn)
Lookback: 3d | Budget consumed: 10% | Burn rate: 1x | Max downtime: 4m 19s

Export Configurations

slo-rules.yml
# Multi-window Multi-Burn-Rate Alerting Rules
# Based on Google SRE Workbook Chapter 5
# SLO Target: 99.900% | Error Budget: 0.09999999999999432%

groups:
  - name: slo.my-service.rules
    rules:
      # ── Recording rules ──────────────────────────
      - record: slo:sli_error:ratio_rate5m
        expr: |
          sum(rate(http_requests_total{service="my-service",code=~"5.."}[5m]))
          /
          sum(rate(http_requests_total{service="my-service"}[5m]))

      - record: slo:sli_error:ratio_rate30m
        expr: |
          sum(rate(http_requests_total{service="my-service",code=~"5.."}[30m]))
          /
          sum(rate(http_requests_total{service="my-service"}[30m]))

      - record: slo:sli_error:ratio_rate1h
        expr: |
          sum(rate(http_requests_total{service="my-service",code=~"5.."}[1h]))
          /
          sum(rate(http_requests_total{service="my-service"}[1h]))

      - record: slo:sli_error:ratio_rate6h
        expr: |
          sum(rate(http_requests_total{service="my-service",code=~"5.."}[6h]))
          /
          sum(rate(http_requests_total{service="my-service"}[6h]))

      - record: slo:sli_error:ratio_rate3d
        expr: |
          sum(rate(http_requests_total{service="my-service",code=~"5.."}[3d]))
          /
          sum(rate(http_requests_total{service="my-service"}[3d]))

      # ── Error budget remaining ───────────────────
      - record: slo:error_budget:remaining
        expr: |
          1 - (
            slo:sli_error:ratio_rate30d / 0.001000
          )

      # ── Alerting rules (Multi-Burn-Rate) ─────────

      # Critical: 2% of 30-day budget consumed in 1 hour (burn rate 14.4x)
      - alert: SLOBurnRateCritical
        expr: |
          slo:sli_error:ratio_rate1h > (14.4 * 0.001000)
          and
          slo:sli_error:ratio_rate5m > (14.4 * 0.001000)
        for: 2m
        labels:
          severity: critical
          service: my-service
          slo: availability
        annotations:
          summary: "High burn rate on SLO (critical)"
          description: "Error rate is consuming error budget 14.4x faster than expected. At this rate, the entire monthly budget will be exhausted in {{ printf \"%.0f\" (div 720 14.4) }} minutes."

      # Warning: 5% of 30-day budget consumed in 6 hours (burn rate 6x)
      - alert: SLOBurnRateWarning
        expr: |
          slo:sli_error:ratio_rate6h > (6 * 0.001000)
          and
          slo:sli_error:ratio_rate30m > (6 * 0.001000)
        for: 5m
        labels:
          severity: warning
          service: my-service
          slo: availability
        annotations:
          summary: "Elevated burn rate on SLO (warning)"
          description: "Error rate is consuming error budget 6x faster than expected. At this rate, the entire monthly budget will be exhausted in {{ printf \"%.0f\" (div 720 6) }} hours."

      # Ticket: 10% of 30-day budget consumed in 3 days (burn rate 1x)
      - alert: SLOBurnRateTicket
        expr: |
          slo:sli_error:ratio_rate3d > (1 * 0.001000)
          and
          slo:sli_error:ratio_rate6h > (1 * 0.001000)
        for: 30m
        labels:
          severity: info
          service: my-service
          slo: availability
        annotations:
          summary: "Slow burn on SLO (ticket)"
          description: "Error rate is steadily consuming the error budget. Current trajectory will exhaust the monthly budget within 30 days."

The Nines — Availability Reference

AvailabilityMonthly DowntimeQuarterlyYearly
72 hours9 days36.5 days
36 hours4.5 days18.25 days
7h 18m21h 54m3d 15h 36m
3h 39m10h 57m1d 19h 48m
43m 50s2h 11m8h 45m 36s
21m 55s1h 5m4h 22m 48s
4m 23s13m 9s52m 34s
26.3s1m 19s5m 15s

About the SLO / Error Budget Calculator

This tool calculates error budgets, allowed downtime, and burn rate alert thresholds from Service Level Objective (SLO) targets. It implements the Multi-Window Multi-Burn-Rate alerting pattern from the Google SRE Workbook.

What is an SLO?

A Service Level Objective (SLO) is a target reliability level for a service, expressed as a percentage (e.g., 99.9% availability). The gap between 100% and the SLO target is the error budget — the acceptable amount of unreliability. For a 99.9% SLO over 30 days, the error budget is 0.1%, which translates to about 43 minutes of allowed downtime per month.

Multi-Burn-Rate Alerts

Simple threshold alerts trigger too late (slow burns) or too often (fast burns). The multi-burn-rate pattern uses three alert tiers with different lookback windows: a 1-hour window for critical fast burns (14.4x rate), a 6-hour window for warning-level burns (6x rate), and a 3-day window for slow burns that generate tickets (1x rate). This approach provides fast detection without excessive noise.

Export Formats

The calculator generates three export formats: Prometheus alerting rules with recording rules and multi-burn-rate alerts, OpenSLO YAML (the vendor-neutral open standard for SLO definitions), and Sloth config (a popular Prometheus SLO framework that generates recording and alerting rules).

How It Works

Everything runs in your browser. Your inputs are never sent to any server. The calculator computes error budgets, translates them to allowed downtime and failed requests, and generates production-ready alerting configurations that you can copy directly into your monitoring stack.

Related Tools & Resources