How to Set Up Database Replication for High Availability

About us: Personal website of Timofey Bugaevsky and the company Zetka Interactive

Guides: A comprehensive collection of technical guides written from a senior developer's perspective. Each article provides in-depth explanations, practical code examples, and production-ready patterns.

Architecture: System design, scalability patterns, and architectural principles

You need to ensure your database can handle failures, provide read scaling, and maintain data durability across multiple servers or regions.

Problem Statement

You need to ensure your database can handle failures, provide read scaling, and maintain data durability across multiple servers or regions.

When to Use Each Pattern

Pattern	Use Case	Trade-offs
Primary-Replica	Read scaling, simple failover	Replication lag, manual failover
Multi-Primary	Write scaling, geographic distribution	Conflict resolution complexity
Synchronous	Zero data loss	Slower writes, requires low latency
Asynchronous	Higher performance	Potential data loss on failure

MySQL/Percona Replication

Step 1: Configure Primary Server

Edit /etc/my.cnf:

[mysqld]
server-id = 1
log_bin = /var/log/mysql/mysql-bin.log
binlog_format = ROW
binlog_do_db = myapp_production
sync_binlog = 1
innodb_flush_log_at_trx_commit = 1
# GTID-based replication (recommended)
gtid_mode = ON
enforce_gtid_consistency = ON

Create replication user:

CREATE USER 'replicator'@'replica-host' IDENTIFIED BY 'secure_password';
GRANT REPLICATION SLAVE ON *.* TO 'replicator'@'replica-host';
FLUSH PRIVILEGES;

Step 2: Configure Replica Server

Edit /etc/my.cnf:

[mysqld]
server-id = 2
relay_log = /var/log/mysql/mysql-relay-bin.log
log_bin = /var/log/mysql/mysql-bin.log
binlog_format = ROW
read_only = ON
# GTID-based replication
gtid_mode = ON
enforce_gtid_consistency = ON

Step 3: Initialize Replication

On the replica:

CHANGE MASTER TO
MASTER_HOST = 'primary-host',
MASTER_USER = 'replicator',
MASTER_PASSWORD = 'secure_password',
MASTER_AUTO_POSITION = 1;
START SLAVE;

Step 4: Verify Replication Status

SHOW SLAVE STATUS\G

Check for:

SlaveIORunning: Yes
SlaveSQLRunning: Yes
SecondsBehindMaster: 0 (or close to 0)

Semi-Synchronous Replication

For stronger durability guarantees:

On the primary:

INSTALL PLUGIN rpl_semi_sync_master SONAME 'semisync_master.so';
SET GLOBAL rpl_semi_sync_master_enabled = 1;
SET GLOBAL rpl_semi_sync_master_timeout = 1000; -- 1 second

On the replica:

INSTALL PLUGIN rpl_semi_sync_slave SONAME 'semisync_slave.so';
SET GLOBAL rpl_semi_sync_slave_enabled = 1;
STOP SLAVE IO_THREAD;
START SLAVE IO_THREAD;

PostgreSQL Streaming Replication

Step 1: Configure Primary Server

Edit postgresql.conf:

wal_level = replica
max_wal_senders = 5
wal_keep_size = 1GB
synchronous_commit = on
synchronous_standby_names = 'standby1' # For synchronous replication

Edit pg_hba.conf:

host replication replicator replica-ip/32 scram-sha-256

Create replication user:

CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'secure_password';

Step 2: Create Base Backup

On the replica:

pg_basebackup -h primary-host -U replicator -D /var/lib/pgsql/data -Fp -Xs -P -R

The -R flag creates standby.signal and configures postgresql.auto.conf.

Step 3: Configure Replica

Edit postgresql.conf (if needed):

hot_standby = on
primary_conninfo = 'host=primary-host user=replicator password=secure_password'

Step 4: Start Replica and Verify

systemctl start postgresql
# On the primary, check connected replicas
SELECT * FROM pg_stat_replication;
# On the replica, check recovery status
SELECT pg_is_in_recovery();

Automatic Failover with Patroni

For production, use Patroni for automatic failover:

# patroni.yml
scope: postgres-cluster
namespace: /service/
name: node1
restapi:
listen: 0.0.0.0:8008
connect_address: node1:8008
etcd:
hosts: etcd1:2379,etcd2:2379,etcd3:2379
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
postgresql:
use_pg_rewind: true
parameters:
wal_level: replica
hot_standby: "on"
max_wal_senders: 5
max_replication_slots: 5
postgresql:
listen: 0.0.0.0:5432
connect_address: node1:5432
data_dir: /var/lib/postgresql/data
authentication:
replication:
username: replicator
password: secure_password
superuser:
username: postgres
password: admin_password

MySQL Replication in Kubernetes

Using Percona Operator

apiVersion: pxc.percona.com/v1
kind: PerconaXtraDBCluster
metadata:
name: mysql-cluster
spec:
crVersion: 1.13.0
secretsName: mysql-secrets
pxc:
size: 3
image: percona/percona-xtradb-cluster:8.0
resources:
requests:
memory: 1G
cpu: 600m
volumeSpec:
persistentVolumeClaim:
resources:
requests:
storage: 100Gi
haproxy:
enabled: true
size: 2
image: percona/percona-xtradb-cluster-operator:1.13.0-haproxy
proxysql:
enabled: false

Connection String

mysql://user:password@mysql-cluster-haproxy:3306/database

PostgreSQL Replication in Kubernetes

Using CloudNativePG Operator

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: postgres-cluster
spec:
instances: 3
postgresql:
parameters:
max_connections: "300"
shared_buffers: "256MB"
storage:
size: 100Gi
storageClass: fast-ssd
backup:
barmanObjectStore:
destinationPath: s3://backups/postgres
s3Credentials:
accessKeyId:
name: s3-creds
key: ACCESS_KEY_ID
secretAccessKey:
name: s3-creds
key: SECRET_ACCESS_KEY

Replication Monitoring

Key Metrics to Monitor

Replication Lag

``` sql -- MySQL SHOW SLAVE STATUS\G -- Check SecondsBehindMaster

-- PostgreSQL SELECT clientaddr, state, sentlsn, writelsn, flushlsn, replaylsn, pgwallsndiff(sentlsn, replaylsn) AS lagbytes FROM pgstat_replication; ```

Connection Status

`` sql -- MySQL: Check SlaveIORunning and SlaveSQLRunning -- PostgreSQL: Check pgstatreplication state ``

Binary Log/WAL Position

Alerting Rules

# Prometheus alerting rules
groups:
- name: database-replication
rules:
- alert: ReplicationLagHigh
expr: mysql_slave_seconds_behind_master > 60
for: 5m
labels:
severity: warning
annotations:
summary: "MySQL replication lag is {{ $value }} seconds"
- alert: ReplicationStopped
expr: mysql_slave_sql_running == 0 OR mysql_slave_io_running == 0
for: 1m
labels:
severity: critical
annotations:
summary: "MySQL replication has stopped"

Failover Procedures

Manual Failover (MySQL)

Stop writes to primary

`` sql SET GLOBAL read_only = ON; FLUSH TABLES WITH READ LOCK; ``

Ensure replica is caught up

`` sql -- On the replica SHOW SLAVE STATUS\G -- Wait for SecondsBehindMaster = 0 ``

Promote replica

`` sql STOP SLAVE; RESET SLAVE ALL; SET GLOBAL read_only = OFF; ``

Update application connection strings

Manual Failover (PostgreSQL)

Promote standby

`` bash pgctl promote -D /var/lib/pgsql/data # Or SELECT pgpromote(); ``

Reconfigure old primary as standby (if recovered)

Best Practices Checklist

[ ] Use GTID-based replication (MySQL) for easier failover
[ ] Configure monitoring for replication lag
[ ] Test failover procedures regularly
[ ] Use semi-synchronous replication for critical data
[ ] Keep binlogs/WAL retained long enough for recovery
[ ] Document and automate failover procedures
[ ] Use connection pooling (PgBouncer, ProxySQL)
[ ] Configure read replicas in application for read scaling
[ ] Back up both primary and replicas
[ ] Monitor disk space for logs and data

Log In

How to Set Up Database Replication for High Availability

Problem Statement

When to Use Each Pattern

MySQL/Percona Replication

Step 1: Configure Primary Server

Step 2: Configure Replica Server

Step 3: Initialize Replication

Step 4: Verify Replication Status

Semi-Synchronous Replication

PostgreSQL Streaming Replication

Step 1: Configure Primary Server

Step 2: Create Base Backup

Step 3: Configure Replica

Step 4: Start Replica and Verify

Automatic Failover with Patroni

MySQL Replication in Kubernetes

Using Percona Operator

Connection String

PostgreSQL Replication in Kubernetes

Using CloudNativePG Operator

Replication Monitoring

Key Metrics to Monitor

Alerting Rules

Failover Procedures

Manual Failover (MySQL)

Manual Failover (PostgreSQL)

Best Practices Checklist

Related Wiki Articles