Problem Statement
You need to deploy a production-grade Kubernetes cluster that can survive node failures, handle rolling updates without downtime, and scale to meet traffic demands.
Prerequisites
- Minimum 3 control plane nodes (2 GB RAM each)
- Minimum 3 worker nodes (1 GB+ RAM each)
- 1 load balancer/endpoint node
- CentOS 7/8 or compatible Linux distribution
- Network connectivity between all nodes
Architecture
┌─────────────────────┐
│ External Traffic │
└──────────┬──────────┘
│
┌──────────▼──────────┐
│ HAProxy + VIP │
│ (Load Balancer) │
└──────────┬──────────┘
│
┌───────────────────────┼───────────────────────┐
│ │ │
┌───────▼───────┐ ┌───────▼───────┐ ┌───────▼───────┐
│ Control Plane │ │ Control Plane │ │ Control Plane │
│ Node 1 │ │ Node 2 │ │ Node 3 │
│ + etcd │ │ + etcd │ │ + etcd │
└───────┬───────┘ └───────┬───────┘ └───────┬───────┘
│ │ │
└───────────────────────┼───────────────────────┘
│
┌───────────────────────┼───────────────────────┐
│ │ │
┌───────▼───────┐ ┌───────▼───────┐ ┌───────▼───────┐
│ Worker Node │ │ Worker Node │ │ Worker Node │
│ 1 │ │ 2 │ │ 3 │
└───────────────┘ └───────────────┘ └───────────────┘
Step 1: Prepare All Nodes
Set Hostnames
On each node, set a unique hostname:
sudo hostnamectl set-hostname control-plane-1 # or an appropriate name
Configure /etc/hosts
Add all cluster nodes to /etc/hosts on each machine:
cat <<EOF >> /etc/hosts
# Kubernetes cluster
10.0.0.10 endpoint
10.0.0.11 control-plane-1
10.0.0.12 control-plane-2
10.0.0.13 control-plane-3
10.0.0.21 worker-1
10.0.0.22 worker-2
10.0.0.23 worker-3
EOF
Disable Swap
Kubernetes requires swap to be disabled:
swapoff -a
sed -i '/swap/d' /etc/fstab
Configure Kernel Parameters
cat <<EOF > /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
EOF
sysctl --system
Disable SELinux (or configure it properly)
setenforce 0
sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config
Step 2: Install Container Runtime
Install Docker
dnf config-manager --add-repo=https://download.docker.com/linux/centos/docker-ce.repo
dnf install -y docker-ce docker-ce-cli containerd.io
# Configure Docker for Kubernetes
mkdir -p /etc/docker
cat <<EOF > /etc/docker/daemon.json
{
"exec-opts": ["native.cgroupdriver=systemd"],
"log-driver": "json-file",
"log-opts": {
"max-size": "100m"
},
"storage-driver": "overlay2"
}
EOF
systemctl daemon-reload
systemctl enable docker --now
Step 3: Install Kubernetes Components
Add Kubernetes Repository
cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://pkgs.k8s.io/core:/stable:/v1.32/rpm/
enabled=1
gpgcheck=1
gpgkey=https://pkgs.k8s.io/core:/stable:/v1.32/rpm/repodata/repomd.xml.key
EOF
Install kubeadm, kubelet, kubectl
dnf install -y kubelet kubeadm kubectl iproute-tc
systemctl enable kubelet
Step 4: Configure Load Balancer (Endpoint Node)
Install HAProxy
dnf install -y haproxy
Configure HAProxy
cat <<EOF > /etc/haproxy/haproxy.cfg
global
log /dev/log local0
log /dev/log local1 notice
daemon
defaults
mode tcp
log global
timeout connect 5s
timeout client 1m
timeout server 1m
frontend kubernetes-apiserver
bind *:6443
mode tcp
option tcplog
default_backend kubernetes-apiserver
backend kubernetes-apiserver
mode tcp
option tcp-check
balance roundrobin
server control-plane-1 10.0.0.11:6443 check fall 3 rise 2
server control-plane-2 10.0.0.12:6443 check fall 3 rise 2
server control-plane-3 10.0.0.13:6443 check fall 3 rise 2
EOF
systemctl enable haproxy --now
Step 5: Configure Firewall
On All Nodes
# Trust cluster nodes
firewall-cmd --zone=trusted --add-source=10.0.0.10
firewall-cmd --zone=trusted --add-source=10.0.0.11
firewall-cmd --zone=trusted --add-source=10.0.0.12
firewall-cmd --zone=trusted --add-source=10.0.0.13
firewall-cmd --zone=trusted --add-source=10.0.0.21
firewall-cmd --zone=trusted --add-source=10.0.0.22
firewall-cmd --zone=trusted --add-source=10.0.0.23
# Pod network (Calico or Flannel)
firewall-cmd --zone=trusted --add-source=10.244.0.0/16
firewall-cmd --zone=trusted --add-masquerade
# Save rules
firewall-cmd --runtime-to-permanent
systemctl restart firewalld
Step 6: Initialize First Control Plane Node
kubeadm init \
--pod-network-cidr=10.244.0.0/16 \
--control-plane-endpoint "endpoint:6443" \
--upload-certs
Save the Output!
The command outputs two join commands: 1. For additional control plane nodes (with --control-plane) 2. For worker nodes
Configure kubectl
mkdir -p $HOME/.kube
cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
chown $(id -u):$(id -g) $HOME/.kube/config
Step 7: Install CNI (Container Network Interface)
Option A: Calico
kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
Option B: Flannel
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
Step 8: Join Additional Control Plane Nodes
On each additional control plane node:
kubeadm join endpoint:6443 \
--token <token> \
--discovery-token-ca-cert-hash sha256:<hash> \
--control-plane \
--certificate-key <certificate-key>
Then configure kubectl:
mkdir -p $HOME/.kube
cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
chown $(id -u):$(id -g) $HOME/.kube/config
Regenerate Certificates if Expired
kubeadm init phase upload-certs --upload-certs
Step 9: Join Worker Nodes
On each worker node:
kubeadm join endpoint:6443 \
--token <token> \
--discovery-token-ca-cert-hash sha256:<hash>
Generate New Join Token if Expired
kubeadm token create --print-join-command
Step 10: Verify Cluster Health
# Check that all nodes are Ready
kubectl get nodes
# Check that all system pods are Running
kubectl get pods -n kube-system
# Check component status
kubectl get componentstatuses
Expected output:
NAME STATUS MESSAGE ERROR
scheduler Healthy ok
controller-manager Healthy ok
etcd-0 Healthy {"health":"true"}
Step 11: Configure Pod Scheduling (Optional)
Enable Scheduling on Control Plane Nodes
By default, pods aren't scheduled on control plane nodes. To allow it:
kubectl taint nodes --all node-role.kubernetes.io/control-plane-
Label Nodes for Workload Distribution
kubectl label nodes worker-1 node-type=compute
kubectl label nodes worker-2 node-type=compute
kubectl label nodes worker-3 node-type=storage
Maintenance Tasks
Upgrade Cluster Version
# On control plane nodes first
dnf upgrade -y kubeadm
kubeadm upgrade plan
kubeadm upgrade apply v1.32.x
# Then upgrade kubelet and kubectl
dnf upgrade -y kubelet kubectl
systemctl daemon-reload
systemctl restart kubelet
Certificate Renewal
# Check certificate expiration
kubeadm certs check-expiration
# Renew certificates
kubeadm certs renew all
Backup etcd
ETCDCTL_API=3 etcdctl snapshot save /backup/etcd-snapshot.db \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key
Troubleshooting
Node Not Ready
# Check kubelet logs
journalctl -u kubelet -f
# Check node conditions
kubectl describe node <node-name>
Pod Network Issues
# Check CNI pods
kubectl get pods -n kube-system | grep -E 'calico|flannel'
# Check CoreDNS
kubectl get pods -n kube-system | grep coredns
API Server Unreachable
# Check HAProxy status
systemctl status haproxy
# Verify backend health
curl -k https://endpoint:6443/healthz
High Availability Checklist
- [ ] 3+ control plane nodes with etcd
- [ ] Load balancer in front of API servers
- [ ] Pod network (CNI) installed and healthy
- [ ] All nodes in Ready state
- [ ] etcd backup strategy in place
- [ ] Certificate rotation planned
- [ ] Monitoring and alerting configured
- [ ] Node auto-scaling configured (if cloud)
- [ ] PodDisruptionBudgets defined for critical workloads
Related Wiki Articles