Skip to main content

rawops.dev

P1

Redis Out of Memory — OOM and maxmemory-policy

Diagnose and fix OOM errors in Redis. Covers reading memory stats, finding big keys, choosing a sane eviction policy, and spotting fragmentation before it triggers a kernel OOM kill.

20 min7 steps
Progress: 0/7 steps
0%

See used memory, peak usage, maxmemory limit, and the current eviction policy in one call.

redis-cli INFO memory | grep -E 'used_memory_human|used_memory_peak_human|maxmemory_human|maxmemory_policy|mem_fragmentation_ratio|total_system_memory_human'
Expected: used_memory_human should be below maxmemory_human. If maxmemory=0 Redis will grow until the kernel OOM-kills the process — set a limit.

The default `noeviction` makes Redis return OOM errors on writes once the cap is hit. A caching workload needs an LRU/LFU policy.

redis-cli CONFIG GET maxmemory-policy && echo '---' && redis-cli CONFIG GET maxmemory
Expected: Cache-only workload: `allkeys-lru` or `allkeys-lfu`. Mixed session/cache: `volatile-lru`. Source of truth: `noeviction` + a hard write-path cap.
Do NOT use `allkeys-*` on a Redis that stores persistent data you can't regenerate — you'll silently lose keys under pressure.

A single misbehaving key (hash with millions of fields, a huge stream) can eat all of Redis memory.

redis-cli --bigkeys

# For a sampled breakdown by type:
redis-cli --memkeys --memkeys-samples 100
Expected: Lists the largest key per type with sampled memory usage. Hashes and streams are the most common offenders.

If `expires` is much lower than `keys`, most of your data has no TTL and will never be reclaimed.

redis-cli INFO keyspace && echo '---' && redis-cli --scan --pattern '*' | head -100 | while read k; do echo "$(redis-cli ttl "$k") $k"; done | sort -n | head -20
Expected: `expires` close to `keys` = healthy TTL coverage. Lines showing `-1 <key>` have no TTL and depend entirely on eviction.

A `mem_fragmentation_ratio` above ~1.5 means the allocator is holding onto memory Redis can no longer use.

redis-cli INFO memory | grep -E 'mem_fragmentation_ratio|allocator_frag|allocator_resident'
Expected: Healthy: 1.0-1.4. Elevated: 1.5-2.0 — schedule a `MEMORY PURGE` or restart during a maintenance window. Above 2.0 — restart required.

If you can't restart right now, reduce pressure with these actions in order of increasing risk.

# 1. Free held allocator memory (jemalloc only, safe):
redis-cli MEMORY PURGE

# 2. Enable lazy free for evictions so eviction itself doesn't block:
redis-cli CONFIG SET lazyfree-lazy-eviction yes
redis-cli CONFIG SET lazyfree-lazy-expire yes

# 3. Tighten maxmemory briefly to force eviction under a sane policy:
redis-cli CONFIG SET maxmemory-policy allkeys-lru
redis-cli CONFIG SET maxmemory 2gb
Expected: Memory usage drops as Redis evicts LRU keys in the background. `MEMORY STATS` shows the recovery.
`CONFIG SET` changes are runtime-only and revert on restart. Update `/etc/redis/redis.conf` for persistence.

Set a hard cap, pick an eviction policy, and add alerting so the next incident doesn't surprise you.

# redis.conf baseline:
# maxmemory 4gb
# maxmemory-policy allkeys-lru
# maxmemory-samples 10
# lazyfree-lazy-eviction yes

# Prometheus alert:
# alert: RedisHighMemory
# expr: redis_memory_used_bytes / redis_memory_max_bytes > 0.85
# for: 10m
Expected: After reload, `INFO memory` reflects the new cap. Alerting fires before the next incident rather than during it.