Every Kubernetes migration I’ve been part of starts the same way in the kickoff: “We need zero downtime.” What follows is usually a journey through blue-green deployments, feature flags, and more than one all-nighter.
Here are the patterns that actually work — not the ones that just sound good in blog posts.
The Problem with “Big Bang” Migrations
The classic mistake is trying to migrate everything at once. A date is set, all teams work toward it, and then on a weekend someone flips the switch. It almost always ends in chaos.
What I recommend instead: the Strangler Fig Pattern.
You build the new Kubernetes infrastructure in parallel while the old one keeps running. A reverse proxy (nginx or an API gateway) decides which traffic goes where. Service by service gets migrated, tested, and only then permanently switched over.
# nginx upstream config during migration
upstream backend {
# 90% to old, 10% to new
server old-monolith:8080 weight=9;
server new-k8s-service:8080 weight=1;
}
Blue-Green Is Not Always the Answer
Blue-green deployments get sold as a cure-all. The problem: you need double the infrastructure, and with stateful services (databases, sessions, caches) things get complicated fast.
For most migrations, Rolling Updates with proper Health Checks is the better choice:
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0 # No outage during the update
maxSurge: 1 # Max 1 extra pod at a time
template:
spec:
containers:
- name: app
readinessProbe:
httpGet:
path: /health/ready
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
livenessProbe:
httpGet:
path: /health/live
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
The key: maxUnavailable: 0 ensures Kubernetes starts a new pod and confirms its readiness before bringing down the old one.
The Underestimated Problem: Database Migrations
Kubernetes deployments are easy. Database migrations with zero downtime are hard.
The rule I apply consistently: every migration must work with both the old and the new code version. That means:
- Additive changes first — add new columns before removing old ones
- Write backwards-compatible queries — as long as both versions can run simultaneously
- Cleanup in a separate deployment — only remove old columns/tables once the new code is fully deployed and stable
-- Migration Phase 1: Additive (safe, deploy)
ALTER TABLE users ADD COLUMN email_verified BOOLEAN DEFAULT false;
-- Migration Phase 2: Backfill (background job)
UPDATE users SET email_verified = true WHERE created_at < '2024-01-01';
-- Migration Phase 3: Cleanup (weeks later, after code stabilized)
-- ALTER TABLE users DROP COLUMN old_verified_field;
Conclusion
Zero-downtime migrations are not magic, but they require discipline in planning. The key points:
- Strangler Fig instead of Big Bang
- Rolling Updates with proper Health Checks for most services
- Blue-Green only when truly necessary (and you accept the cost)
- Database migrations always in multiple backwards-compatible phases
Follow this consistently and you’ll sleep much better on deployment day.