Introduction
Database replication ensures data durability and availability by maintaining copies across multiple servers. For production systems, replication isn't optional—it's essential for disaster recovery, read scaling, and maintenance without downtime. This guide covers practical replication strategies for PostgreSQL and MySQL.
Replication Concepts
Replication Types
Physical Replication: Copies the exact bytes of the database. Fast, but replicas must be identical PostgreSQL versions.
Logical Replication: Copies logical changes (SQL operations). More flexible, allows different versions and selective replication.
Synchronous vs. Asynchronous:
- Synchronous: Primary waits for replica confirmation. No data loss, but higher latency.
- Asynchronous: Primary doesn't wait. Better performance, but potential data loss.
PostgreSQL Streaming Replication
Primary Server Configuration
# postgresql.conf on the primary
# Enable WAL archiving
wal_level = replica
max_wal_senders = 10
wal_keep_size = 1GB
# For synchronous replication (optional)
synchronous_standby_names = 'replica1'
# pg_hba.conf - Allow replication connections
host replication replicator 192.168.1.0/24 md5
Create replication user:
CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'secure_password';
Replica Server Setup
# Stop PostgreSQL on the replica
sudo systemctl stop postgresql
# Remove existing data
rm -rf /var/lib/postgresql/14/main/*
# Copy from the primary using pg_basebackup
pg_basebackup -h primary.example.com -U replicator -D /var/lib/postgresql/14/main -P -R
# The -R flag creates standby.signal and configures the connection
# postgresql.conf on the replica
# Connection to the primary
primary_conninfo = 'host=primary.example.com port=5432 user=replicator password=secure_password'
# Start the replica
sudo systemctl start postgresql
Verify Replication
-- On the primary: Check replication status
SELECT client_addr, state, sent_lsn, write_lsn, replay_lsn
FROM pg_stat_replication;
-- On the replica: Check recovery status
SELECT pg_is_in_recovery();
SELECT pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn();
PostgreSQL Logical Replication
Publisher (Source)
# postgresql.conf
wal_level = logical
max_replication_slots = 4
max_wal_senders = 4
-- Create publication
CREATE PUBLICATION my_publication FOR TABLE users, orders;
-- Or all tables
CREATE PUBLICATION full_publication FOR ALL TABLES;
Subscriber (Target)
-- Create subscription
CREATE SUBSCRIPTION my_subscription
CONNECTION 'host=publisher.example.com dbname=mydb user=replicator password=secure'
PUBLICATION my_publication;
-- Check status
SELECT * FROM pg_stat_subscription;
MySQL Replication
Primary Configuration
# my.cnf on the primary
[mysqld]
server-id = 1
log_bin = mysql-bin
binlog_format = ROW
binlog_do_db = mydb
-- Create replication user
CREATE USER 'replicator'@'%' IDENTIFIED BY 'secure_password';
GRANT REPLICATION SLAVE ON *.* TO 'replicator'@'%';
FLUSH PRIVILEGES;
-- Get binary log position
SHOW MASTER STATUS;
-- Note: File and Position values
Replica Configuration
# my.cnf on the replica
[mysqld]
server-id = 2
relay_log = mysql-relay-bin
read_only = 1
-- Configure replication
CHANGE MASTER TO
MASTER_HOST='primary.example.com',
MASTER_USER='replicator',
MASTER_PASSWORD='secure_password',
MASTER_LOG_FILE='mysql-bin.000001',
MASTER_LOG_POS=154;
-- Start replication
START SLAVE;
-- Check status
SHOW SLAVE STATUS\G
Failover Strategies
Manual Failover (PostgreSQL)
-- On the replica: Promote to primary
SELECT pg_promote();
-- Or via the command line
pg_ctl promote -D /var/lib/postgresql/14/main
Update application connection strings to point to the new primary.
Automatic Failover with Patroni
Patroni manages PostgreSQL high availability with automatic failover.
# patroni.yml
scope: postgres-cluster
name: node1
restapi:
listen: 0.0.0.0:8008
connect_address: node1.example.com:8008
etcd:
hosts: etcd1:2379,etcd2:2379,etcd3:2379
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
postgresql:
use_pg_rewind: true
parameters:
max_connections: 100
postgresql:
listen: 0.0.0.0:5432
connect_address: node1.example.com:5432
data_dir: /var/lib/postgresql/14/main
authentication:
replication:
username: replicator
password: secure_password
superuser:
username: postgres
password: postgres_password
HAProxy for Connection Routing
# haproxy.cfg
frontend postgres_frontend
bind *:5432
default_backend postgres_backend
backend postgres_backend
option httpchk GET /master
http-check expect status 200
server node1 node1:5432 check port 8008
server node2 node2:5432 check port 8008
server node3 node3:5432 check port 8008
Read Scaling with Replicas
Application-Level Read/Write Splitting
// Laravel example
// config/database.php
'pgsql' => [
'read' => [
'host' => [
'replica1.example.com',
'replica2.example.com',
],
],
'write' => [
'host' => 'primary.example.com',
],
'driver' => 'pgsql',
'database' => 'mydb',
// ...
],
// Force write connection for critical reads
$user = DB::connection('pgsql')->select('SELECT * FROM users WHERE id = ?', [1]);
Connection Pooling with Read Replicas
# PgBouncer with multiple backends
[databases]
mydb = host=primary.example.com port=5432
mydb_ro = host=replica1.example.com,replica2.example.com port=5432
Monitoring Replication
PostgreSQL Replication Lag
-- On the primary: Check lag per replica
SELECT client_addr,
pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) AS lag_bytes,
pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) / 1024 / 1024 AS lag_mb
FROM pg_stat_replication;
-- On the replica: Check lag
SELECT EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp())) AS lag_seconds;
MySQL Replication Lag
SHOW SLAVE STATUS\G
-- Look for: Seconds_Behind_Master
Alerting Thresholds
- Warning: Lag > 10 seconds
- Critical: Lag > 60 seconds
- Emergency: Replica disconnected
Backup with Replicas
Take backups from replicas to avoid impacting the primary:
# PostgreSQL - backup from replica
pg_dump -h replica.example.com -U postgres mydb > backup.sql
# Or physical backup
pg_basebackup -h replica.example.com -U replicator -D /backup -P
Best Practices
- Use synchronous replication for critical data (at least one replica)
- Monitor replication lag continuously
- Test failover regularly in non-production
- Keep replicas in different availability zones
- Document failover procedures
- Automate failover where possible (Patroni, orchestrator)
- Use connection pooling to manage connections efficiently
Conclusion
Database replication provides the foundation for high availability and disaster recovery. Start with asynchronous streaming replication for read scaling, add synchronous replication for critical data, and implement automatic failover for production systems. Regular testing ensures your failover procedures work when needed.