How to Set Up a High-Availability Kubernetes Cluster

About us: Personal website of Timofey Bugaevsky and the company Zetka Interactive

Guides: A comprehensive collection of technical guides written from a senior developer's perspective. Each article provides in-depth explanations, practical code examples, and production-ready patterns.

Architecture: System design, scalability patterns, and architectural principles

You need to deploy a production-grade Kubernetes cluster that can survive node failures, handle rolling updates without downtime, and scale to meet traffic demands.

Problem Statement

You need to deploy a production-grade Kubernetes cluster that can survive node failures, handle rolling updates without downtime, and scale to meet traffic demands.

Prerequisites

Minimum 3 control plane nodes (2 GB RAM each)
Minimum 3 worker nodes (1 GB+ RAM each)
1 load balancer/endpoint node
CentOS 7/8 or compatible Linux distribution
Network connectivity between all nodes

Architecture

 ┌─────────────────────┐
│ External Traffic │
└──────────┬──────────┘
│
┌──────────▼──────────┐
│ HAProxy + VIP │
│ (Load Balancer) │
└──────────┬──────────┘
│
┌───────────────────────┼───────────────────────┐
│ │ │
┌───────▼───────┐ ┌───────▼───────┐ ┌───────▼───────┐
│ Control Plane │ │ Control Plane │ │ Control Plane │
│ Node 1 │ │ Node 2 │ │ Node 3 │
│ + etcd │ │ + etcd │ │ + etcd │
└───────┬───────┘ └───────┬───────┘ └───────┬───────┘
│ │ │
└───────────────────────┼───────────────────────┘
│
┌───────────────────────┼───────────────────────┐
│ │ │
┌───────▼───────┐ ┌───────▼───────┐ ┌───────▼───────┐
│ Worker Node │ │ Worker Node │ │ Worker Node │
│ 1 │ │ 2 │ │ 3 │
└───────────────┘ └───────────────┘ └───────────────┘

Step 1: Prepare All Nodes

Set Hostnames

On each node, set a unique hostname:

sudo hostnamectl set-hostname control-plane-1 # or an appropriate name

Configure /etc/hosts

Add all cluster nodes to /etc/hosts on each machine:

cat <<EOF >> /etc/hosts
# Kubernetes cluster
10.0.0.10 endpoint
10.0.0.11 control-plane-1
10.0.0.12 control-plane-2
10.0.0.13 control-plane-3
10.0.0.21 worker-1
10.0.0.22 worker-2
10.0.0.23 worker-3
EOF

Disable Swap

Kubernetes requires swap to be disabled:

swapoff -a
sed -i '/swap/d' /etc/fstab

Configure Kernel Parameters

cat <<EOF > /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
EOF
sysctl --system

Disable SELinux (or configure it properly)

setenforce 0
sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config

Step 2: Install Container Runtime

Install Docker

dnf config-manager --add-repo=https://download.docker.com/linux/centos/docker-ce.repo
dnf install -y docker-ce docker-ce-cli containerd.io
# Configure Docker for Kubernetes
mkdir -p /etc/docker
cat <<EOF > /etc/docker/daemon.json
{
"exec-opts": ["native.cgroupdriver=systemd"],
"log-driver": "json-file",
"log-opts": {
"max-size": "100m"
},
"storage-driver": "overlay2"
}
EOF
systemctl daemon-reload
systemctl enable docker --now

Step 3: Install Kubernetes Components

Add Kubernetes Repository

cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://pkgs.k8s.io/core:/stable:/v1.32/rpm/
enabled=1
gpgcheck=1
gpgkey=https://pkgs.k8s.io/core:/stable:/v1.32/rpm/repodata/repomd.xml.key
EOF

Install kubeadm, kubelet, kubectl

dnf install -y kubelet kubeadm kubectl iproute-tc
systemctl enable kubelet

Step 4: Configure Load Balancer (Endpoint Node)

Install HAProxy

dnf install -y haproxy

Configure HAProxy

cat <<EOF > /etc/haproxy/haproxy.cfg
global
log /dev/log local0
log /dev/log local1 notice
daemon
defaults
mode tcp
log global
timeout connect 5s
timeout client 1m
timeout server 1m
frontend kubernetes-apiserver
bind *:6443
mode tcp
option tcplog
default_backend kubernetes-apiserver
backend kubernetes-apiserver
mode tcp
option tcp-check
balance roundrobin
server control-plane-1 10.0.0.11:6443 check fall 3 rise 2
server control-plane-2 10.0.0.12:6443 check fall 3 rise 2
server control-plane-3 10.0.0.13:6443 check fall 3 rise 2
EOF
systemctl enable haproxy --now

Step 5: Configure Firewall

On All Nodes

# Trust cluster nodes
firewall-cmd --zone=trusted --add-source=10.0.0.10
firewall-cmd --zone=trusted --add-source=10.0.0.11
firewall-cmd --zone=trusted --add-source=10.0.0.12
firewall-cmd --zone=trusted --add-source=10.0.0.13
firewall-cmd --zone=trusted --add-source=10.0.0.21
firewall-cmd --zone=trusted --add-source=10.0.0.22
firewall-cmd --zone=trusted --add-source=10.0.0.23
# Pod network (Calico or Flannel)
firewall-cmd --zone=trusted --add-source=10.244.0.0/16
firewall-cmd --zone=trusted --add-masquerade
# Save rules
firewall-cmd --runtime-to-permanent
systemctl restart firewalld

Step 6: Initialize First Control Plane Node

kubeadm init \
--pod-network-cidr=10.244.0.0/16 \
--control-plane-endpoint "endpoint:6443" \
--upload-certs

Save the Output!

The command outputs two join commands: 1. For additional control plane nodes (with --control-plane) 2. For worker nodes

Configure kubectl

mkdir -p $HOME/.kube
cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
chown $(id -u):$(id -g) $HOME/.kube/config

Step 7: Install CNI (Container Network Interface)

Option A: Calico

kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml

Option B: Flannel

kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

Step 8: Join Additional Control Plane Nodes

On each additional control plane node:

kubeadm join endpoint:6443 \
--token <token> \
--discovery-token-ca-cert-hash sha256:<hash> \
--control-plane \
--certificate-key <certificate-key>

Then configure kubectl:

mkdir -p $HOME/.kube
cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
chown $(id -u):$(id -g) $HOME/.kube/config

Regenerate Certificates if Expired

kubeadm init phase upload-certs --upload-certs

Step 9: Join Worker Nodes

On each worker node:

kubeadm join endpoint:6443 \
--token <token> \
--discovery-token-ca-cert-hash sha256:<hash>

Generate New Join Token if Expired

kubeadm token create --print-join-command

Step 10: Verify Cluster Health

# Check that all nodes are Ready
kubectl get nodes
# Check that all system pods are Running
kubectl get pods -n kube-system
# Check component status
kubectl get componentstatuses

Expected output:

NAME STATUS MESSAGE ERROR
scheduler Healthy ok
controller-manager Healthy ok
etcd-0 Healthy {"health":"true"}

Step 11: Configure Pod Scheduling (Optional)

Enable Scheduling on Control Plane Nodes

By default, pods aren't scheduled on control plane nodes. To allow it:

kubectl taint nodes --all node-role.kubernetes.io/control-plane-

Label Nodes for Workload Distribution

kubectl label nodes worker-1 node-type=compute
kubectl label nodes worker-2 node-type=compute
kubectl label nodes worker-3 node-type=storage

Maintenance Tasks

Upgrade Cluster Version

# On control plane nodes first
dnf upgrade -y kubeadm
kubeadm upgrade plan
kubeadm upgrade apply v1.32.x
# Then upgrade kubelet and kubectl
dnf upgrade -y kubelet kubectl
systemctl daemon-reload
systemctl restart kubelet

Certificate Renewal

# Check certificate expiration
kubeadm certs check-expiration
# Renew certificates
kubeadm certs renew all

Backup etcd

ETCDCTL_API=3 etcdctl snapshot save /backup/etcd-snapshot.db \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key

Troubleshooting

Node Not Ready

# Check kubelet logs
journalctl -u kubelet -f
# Check node conditions
kubectl describe node <node-name>

Pod Network Issues

# Check CNI pods
kubectl get pods -n kube-system | grep -E 'calico|flannel'
# Check CoreDNS
kubectl get pods -n kube-system | grep coredns

API Server Unreachable

# Check HAProxy status
systemctl status haproxy
# Verify backend health
curl -k https://endpoint:6443/healthz

High Availability Checklist

[ ] 3+ control plane nodes with etcd
[ ] Load balancer in front of API servers
[ ] Pod network (CNI) installed and healthy
[ ] All nodes in Ready state
[ ] etcd backup strategy in place
[ ] Certificate rotation planned
[ ] Monitoring and alerting configured
[ ] Node auto-scaling configured (if cloud)
[ ] PodDisruptionBudgets defined for critical workloads

Log In