Kubernetes Migration Without Downtime: What Actually Works

Every Kubernetes migration I’ve been part of starts the same way in the kickoff: “We need zero downtime.” What follows is usually a journey through blue-green deployments, feature flags, and more than one all-nighter.

Here are the patterns that actually work — not the ones that just sound good in blog posts.

The Problem with “Big Bang” Migrations

The classic mistake is trying to migrate everything at once. A date is set, all teams work toward it, and then on a weekend someone flips the switch. It almost always ends in chaos.

What I recommend instead: the Strangler Fig Pattern.

You build the new Kubernetes infrastructure in parallel while the old one keeps running. A reverse proxy (nginx or an API gateway) decides which traffic goes where. Service by service gets migrated, tested, and only then permanently switched over.

# nginx upstream config during migration
upstream backend {
  # 90% to old, 10% to new
  server old-monolith:8080 weight=9;
  server new-k8s-service:8080 weight=1;
}

Blue-Green Is Not Always the Answer

Blue-green deployments get sold as a cure-all. The problem: you need double the infrastructure, and with stateful services (databases, sessions, caches) things get complicated fast.

For most migrations, Rolling Updates with proper Health Checks is the better choice:

spec:
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 0      # No outage during the update
      maxSurge: 1            # Max 1 extra pod at a time
  template:
    spec:
      containers:
        - name: app
          readinessProbe:
            httpGet:
              path: /health/ready
              port: 8080
            initialDelaySeconds: 10
            periodSeconds: 5
          livenessProbe:
            httpGet:
              path: /health/live
              port: 8080
            initialDelaySeconds: 30
            periodSeconds: 10

The key: maxUnavailable: 0 ensures Kubernetes starts a new pod and confirms its readiness before bringing down the old one.

The Underestimated Problem: Database Migrations

Kubernetes deployments are easy. Database migrations with zero downtime are hard.

The rule I apply consistently: every migration must work with both the old and the new code version. That means:

Additive changes first — add new columns before removing old ones
Write backwards-compatible queries — as long as both versions can run simultaneously
Cleanup in a separate deployment — only remove old columns/tables once the new code is fully deployed and stable

-- Migration Phase 1: Additive (safe, deploy)
ALTER TABLE users ADD COLUMN email_verified BOOLEAN DEFAULT false;

-- Migration Phase 2: Backfill (background job)
UPDATE users SET email_verified = true WHERE created_at < '2024-01-01';

-- Migration Phase 3: Cleanup (weeks later, after code stabilized)
-- ALTER TABLE users DROP COLUMN old_verified_field;

Conclusion

Zero-downtime migrations are not magic, but they require discipline in planning. The key points:

Strangler Fig instead of Big Bang
Rolling Updates with proper Health Checks for most services
Blue-Green only when truly necessary (and you accept the cost)
Database migrations always in multiple backwards-compatible phases

Follow this consistently and you’ll sleep much better on deployment day.