Skip to main content

rawops.dev

P2

Replication Lag — MySQL Troubleshooting Guide

Diagnose and fix replication lag between MySQL/MariaDB primary and replica servers. Covers IO/SQL thread status, binary log analysis, and safe error recovery.

20 min7 steps
Progress: 0/7 steps
0%

Get the current replication state and lag.

mysql -e 'SHOW SLAVE STATUS\G' | grep -E 'Seconds_Behind|Running|Error|State'
Expected: Slave_IO_Running and Slave_SQL_Running should both be 'Yes'. Seconds_Behind_Master shows lag in seconds.

Look for specific error messages blocking replication.

mysql -e 'SHOW SLAVE STATUS\G' | grep -E 'Last_Error|Last_SQL_Error|Last_IO_Error'
Expected: If non-empty, shows the exact error. Common: duplicate key, missing table, or network timeout.

High load on the replica can cause it to fall behind.

top -b -n 1 | head -15 && echo '---' && iostat -x 1 3 2>/dev/null | tail -10
Expected: High CPU/IO on replica means it can't apply changes fast enough. Check for heavy read queries.

Queries on the replica can block replication SQL thread.

mysql -e 'SELECT ID, USER, HOST, TIME, STATE, LEFT(INFO, 80) FROM INFORMATION_SCHEMA.PROCESSLIST WHERE TIME > 10 ORDER BY TIME DESC;'
Expected: Long-running queries (especially those with table locks) can block the SQL thread.

Compare primary and replica positions to estimate the gap.

# On primary:
mysql -e 'SHOW MASTER STATUS;'
# On replica:
mysql -e 'SHOW SLAVE STATUS\G' | grep -E 'Master_Log_File|Read_Master_Log_Pos|Exec_Master_Log_Pos'
Expected: Compare Read_Master_Log_Pos (what's downloaded) vs Exec_Master_Log_Pos (what's applied). Large gap = SQL thread is slow.

If replication is stuck on a specific error, you may need to skip it.

# Only if you understand the error and skipping is safe:
mysql -e 'STOP SLAVE; SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1; START SLAVE;'
Expected: Replication resumed after skipping the problematic event. Monitor Seconds_Behind_Master.
Skipping events can cause data inconsistency between primary and replica. Only use as emergency measure and verify data afterward.

Monitor until lag drops to zero.

watch -n 1 "mysql -e 'SHOW SLAVE STATUS\G' | grep -E 'Seconds_Behind|Running'"
Expected: Seconds_Behind_Master should decrease steadily and reach 0.