Problem Statement
You need to deploy new versions of your application without interrupting service to users, handling database migrations safely, and being able to roll back quickly if issues arise.
Deployment Strategies Comparison
| Strategy | Downtime | Rollback Speed | Resource Cost | Complexity |
|---|
| Rolling Update | Zero | Fast | Low | Low |
| Blue-Green | Zero | Instant | 2x | Medium |
| Canary | Zero | Fast | Low-Medium | High |
| A/B Testing | Zero | Fast | Low-Medium | High |
Strategy 1: Rolling Update (Kubernetes Default)
How It Works
Kubernetes gradually replaces old pods with new ones, ensuring minimum availability is maintained throughout.
Time 0: [v1] [v1] [v1] [v1] ← All running v1
Time 1: [v1] [v1] [v1] [v2] ← One v2 starting
Time 2: [v1] [v1] [v2] [v2] ← Two v2 ready
Time 3: [v1] [v2] [v2] [v2] ← Three v2 ready
Time 4: [v2] [v2] [v2] [v2] ← All running v2
Configuration
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
spec:
replicas: 4
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1 # At most 1 pod can be unavailable.
maxSurge: 1 # At most 1 extra pod during rollout.
template:
spec:
containers:
- name: app
image: myapp:v2
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 3
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
Graceful Shutdown
Ensure your application handles SIGTERM properly:
// Node.js example.
const server = app.listen(8080);
process.on('SIGTERM', () => {
console.log('SIGTERM received, shutting down gracefully');
server.close(() => {
console.log('HTTP server closed');
// Close database connections.
db.close();
process.exit(0);
});
// Force close after 30 seconds.
setTimeout(() => {
console.error('Forced shutdown');
process.exit(1);
}, 30000);
});
# Kubernetes configuration.
spec:
terminationGracePeriodSeconds: 30
containers:
- name: app
lifecycle:
preStop:
exec:
# Give the load balancer time to remove the pod from rotation.
command: ["/bin/sh", "-c", "sleep 10"]
Strategy 2: Blue-Green Deployment
How It Works
Run two identical environments. Route all traffic to one (blue), deploy to the other (green), then switch.
┌──────────────────┐
│ Load Balancer │
└────────┬─────────┘
│
┌───────────┼───────────┐
│ │ │
┌──────▼──────┐ │ ┌──────▼──────┐
│ Blue │ │ │ Green │
│ (v1.0) │◀───┘ │ (v1.1) │
│ ACTIVE │ │ STANDBY │
└─────────────┘ └─────────────┘
Kubernetes Implementation
# blue-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp-blue
labels:
app: myapp
version: blue
spec:
replicas: 3
selector:
matchLabels:
app: myapp
version: blue
template:
metadata:
labels:
app: myapp
version: blue
spec:
containers:
- name: app
image: myapp:v1.0
---
# green-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp-green
labels:
app: myapp
version: green
spec:
replicas: 3
selector:
matchLabels:
app: myapp
version: green
template:
metadata:
labels:
app: myapp
version: green
spec:
containers:
- name: app
image: myapp:v1.1
---
# service.yaml - switch by changing selector
apiVersion: v1
kind: Service
metadata:
name: myapp
spec:
selector:
app: myapp
version: blue # Change to 'green' to switch
ports:
- port: 80
targetPort: 8080
Switch Script
##!/bin/bash
CURRENT=$(kubectl get service myapp -o jsonpath='{.spec.selector.version}')
if [ "$CURRENT" = "blue" ]; then
NEW="green"
else
NEW="blue"
fi
echo "Switching from $CURRENT to $NEW"
kubectl patch service myapp -p "{\"spec\":{\"selector\":{\"version\":\"$NEW\"}}}"
echo "Traffic now routing to $NEW"
Strategy 3: Canary Deployment
How It Works
Deploy a new version to a small subset of users first, then gradually increase.
Phase 1: [v1][v1][v1][v1][v1][v1][v1][v1][v1][v2] ← 10% canary
Phase 2: [v1][v1][v1][v1][v1][v2][v2][v2][v2][v2] ← 50% canary
Phase 3: [v2][v2][v2][v2][v2][v2][v2][v2][v2][v2] ← 100% promoted
Using Nginx Ingress
# Main deployment (stable).
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp-stable
spec:
replicas: 9
template:
spec:
containers:
- name: app
image: myapp:v1.0
---
# Canary deployment.
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp-canary
spec:
replicas: 1
template:
spec:
containers:
- name: app
image: myapp:v1.1
---
# Stable Ingress.
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: myapp-stable
spec:
rules:
- host: app.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: myapp-stable
port:
number: 80
---
# Canary Ingress with weight.
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: myapp-canary
annotations:
nginx.ingress.kubernetes.io/canary: "true"
nginx.ingress.kubernetes.io/canary-weight: "10"
spec:
rules:
- host: app.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: myapp-canary
port:
number: 80
Canary with Istio
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: myapp
spec:
hosts:
- myapp
http:
- match:
- headers:
x-canary:
exact: "true"
route:
- destination:
host: myapp
subset: canary
- route:
- destination:
host: myapp
subset: stable
weight: 90
- destination:
host: myapp
subset: canary
weight: 10
---
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: myapp
spec:
host: myapp
subsets:
- name: stable
labels:
version: stable
- name: canary
labels:
version: canary
Safe Database Migrations
Principles
- Backward Compatible: Old code must work with the new schema
- Forward Compatible: New code must work with the old schema during rollout
- Small, Incremental Changes: Never big-bang migrations
Pattern: Expand-Contract Migration
Phase 1: Expand (Add new column)
-- Migration 1: Add new column, keep old.
ALTER TABLE users ADD COLUMN full_name VARCHAR(255);
// Code v2: Write to both, read from old.
const user = await db.query('SELECT first_name, last_name, full_name FROM users');
user.displayName = user.full_name || `${user.first_name} ${user.last_name}`;
// On save, write to both.
await db.query(
'UPDATE users SET first_name = $1, last_name = $2, full_name = $3',
[firstName, lastName, fullName]
);
Phase 2: Migrate Data
-- Migration 2: Backfill data.
UPDATE users SET full_name = CONCAT(first_name, ' ', last_name)
WHERE full_name IS NULL;
Phase 3: Contract (Remove old columns)
// Code v3: Read/write only new column.
const user = await db.query('SELECT full_name FROM users');
-- Migration 3: Remove old columns (after all pods on v3).
ALTER TABLE users DROP COLUMN first_name;
ALTER TABLE users DROP COLUMN last_name;
Migration Job in Kubernetes
apiVersion: batch/v1
kind: Job
metadata:
name: db-migration-v1-2
spec:
backoffLimit: 0
template:
spec:
restartPolicy: Never
initContainers:
- name: wait-for-db
image: busybox
command: ['sh', '-c', 'until nc -z postgres 5432; do sleep 1; done']
containers:
- name: migrate
image: myapp:v1.2
command: ["npm", "run", "db:migrate"]
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: db-credentials
key: url
Rollback Procedures
Kubernetes Rollback
# View deployment history.
kubectl rollout history deployment/myapp
# Roll back to previous version.
kubectl rollout undo deployment/myapp
# Roll back to a specific revision.
kubectl rollout undo deployment/myapp --to-revision=2
# Check rollback status.
kubectl rollout status deployment/myapp
Blue-Green Instant Rollback
# Simply switch the service selector back.
kubectl patch service myapp -p '{"spec":{"selector":{"version":"blue"}}}'
Database Rollback
-- Always have a rollback migration ready.
-- down-migration.sql
ALTER TABLE users ADD COLUMN first_name VARCHAR(255);
ALTER TABLE users ADD COLUMN last_name VARCHAR(255);
UPDATE users SET
first_name = SPLIT_PART(full_name, ' ', 1),
last_name = SPLIT_PART(full_name, ' ', 2);
Health Checks Best Practices
Implement Three Endpoints
// /health - Liveness: Is the process alive?
app.get('/health', (req, res) => {
res.status(200).json({ status: 'ok' });
});
// /ready - Readiness: Can the service handle requests?
app.get('/ready', async (req, res) => {
try {
await db.query('SELECT 1');
await redis.ping();
res.status(200).json({ status: 'ready' });
} catch (error) {
res.status(503).json({ status: 'not ready', error: error.message });
}
});
// /startup - Startup: Has the service finished initializing?
let isStarted = false;
app.get('/startup', (req, res) => {
if (isStarted) {
res.status(200).json({ status: 'started' });
} else {
res.status(503).json({ status: 'starting' });
}
});
Kubernetes Probes Configuration
spec:
containers:
- name: app
startupProbe:
httpGet:
path: /startup
port: 8080
failureThreshold: 30
periodSeconds: 10
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 0
periodSeconds: 10
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 0
periodSeconds: 5
failureThreshold: 3
The Senior Mindset: It's About State, Not Code
"Maintenance Mode" is not acceptable for senior-level systems. The real challenge isn't swapping code—it's managing state.
Why Database Migrations Are the Hard Part
Rule: Database changes must be backward compatible with both old and new code running simultaneously.
Scenario: Renaming a Column
- Naive:
ALTER TABLE users RENAME COLUMN name TO full_name; - Result: The old code (still running during deploy) queries
name. The DB has full_name. Crash.
Senior Pattern: Expand and Contract 1. Deploy 1 (Expand): Add fullname column. Code writes to both name and fullname. Reads from name. 2. Backfill: Run script to copy name to fullname for old rows. 3. Deploy 2 (Switch): Code reads from fullname. Writes to both. 4. Deploy 3 (Contract): Code writes only to full_name. Remove name column.
This pattern takes 3 deployments instead of 1, but guarantees zero dropped requests.
Choosing the Right Strategy
| Scenario | Recommended Strategy |
|---|
| Standard app update, cost-conscious | Rolling Update |
| Critical path, need instant rollback | Blue-Green |
| High-risk change, need gradual rollout | Canary |
| A/B experiment with user segmentation | Canary with header-based routing |
Pro Tips:
- Rolling Update: You MUST have a
readinessProbe. If not, K8s will send traffic to the new pod before the app is loaded, causing 502s. - Blue-Green: Double the cost (need 2x resources), but instant switch and instant rollback.
- Canary: The safest option—deploy to 5% of users, monitor metrics, then increase to 20%, 50%, 100%.
Istio VirtualService for Canary
When using Istio, you can do weighted routing:
# Istio VirtualService
route:
- destination:
host: my-service
subset: v1
weight: 90
- destination:
host: my-service
subset: v2
weight: 10
Zero-Downtime Checklist
- [ ] Application handles SIGTERM gracefully
- [ ] PreStop hook gives time for load balancer update
- [ ] Readiness probe verifies all dependencies
- [ ] Rolling update strategy configured with appropriate values
- [ ] Database migrations are backward compatible
- [ ] Expand-Contract pattern used for schema changes
- [ ] Feature flags for new functionality
- [ ] Monitoring alerts for deployment health
- [ ] Runbook for rollback procedures
- [ ] Load testing performed before major releases
- [ ] Deployment during low-traffic windows (if applicable)
The Bottom Line
Zero-downtime is mostly about database compatibility. If your schema changes break the previous version of the code, no amount of Kubernetes magic will save you. Always expand first, then contract.
Related Wiki Articles