Updating Kubernetes Deployments

Kynan Rilee
Jan 25, 2018 · 4 min read

Kubernetes Deployments are meant to be long-lived resources that evolve over time. They’re allowed to change in a few ways:

  • Change the Pod template used to populate the Deployment.

In order to apply these change, Kubernetes Deployments support two different update strategies. The first is Recreate, which kills the Deployment’s existing Pods before creating new Pods from the new template. The second is Rolling Update, which incrementally replaces old Pods with new ones until all the old Pods are gone.

Rolling updates happen in waves — control how they ebb and flow. (src)

Rolling Update

Rolling update is an interesting strategy because it can be configured to fit a variety of scenarios. For example, consider the degenerate case — a Deployment with a single replica:

Upgrading from Pod Template 1.0 to Pod Template 2.0Pods before upgrade:
[ Pod 1.0 ]

A rolling update can follow two courses of action. If we really don’t want to run more than one Pod at a time, we can kill the old Pod before creating its replacement:

Step 0 (before upgrade):
[ Pod 1.0 ]
Step 1:
[ /* no Pods */ ]
Step 2:
[ Pod 2.0 ]

In this situation, our primary concern was avoiding the overhead of over-provisioning Pods during the update process. We decided there could be zero extra Pods at any time. The configuration field for limiting overhead is called maxSurge. (More on that later.)

What if instead of worrying about overhead, we want to be sure we always have at least 1 running Pod? We’ll create the new Pod before killing the old one:

Step 0 (before upgrade):
[ Pod 1.0 ]
Step 1:
[ Pod 1.0, Pod 2.0 ]
Step 2:
[ Pod 2.0 ]

Here, our primary concern was ensuring that our Deployment’s work capacity never dropped below a certain level during the update process. The acceptable loss of capacity was 0% — we can never have less than 1 running Pod. The configuration field for limiting decreases in capacity is maxUnavailable. (See below.)

Max Surge

We saw what maxSurge does in an extreme case: maxSurge = 0, replicas = 1
What else can it do? Here are some maxSurge values and what they mean:

maxSurge = 1, replicas = 2Step 0: [Pod 1.0, Pod 1.0]
Step 1: [Pod 2.0, Pod 1.0, Pod 1.0]
Step 2: [Pod 2.0, Pod 1.0]
Step 3: [Pod 2.0, Pod 2.0, Pod 1.0]
Step 4: [Pod 2.0, Pod 2.0]

During the rolling update, there can be up to 1 + 2 running Pods at once. Only one new Pod is created at a time. After the first new Pod is Ready, an old Pod is killed and the second new Pod is created. When the second new Pod is ready, the last old Pod is killed, completing the process.

maxSurge = 100%, replicas = 5Step 0: [Pod 1.0 x5]
Step 1: [Pod 2.0 x5, Pod 1.0 x5]
...
Step n: [Pod 2.0 x5]

There can be up to 100% * 5 extra Pods at one time. All 5 new Pods will be created at the same time. As the new Pods become Ready, the old Pods are killed off.

Max Unavailable

maxUnavailable is the mirror of maxSurge. Here are some example values and what they mean:

maxUnavailable = 1, replicas = 2Step 0: [Pod 1.0, Pod 1.0]
Step 1: [Pod 1.0]
Step 2: [Pod 2.0, Pod 1.0]
Step 3: [Pod 2.0]
Step 4: [Pod 2.0, Pod 2.0]

Where maxSurge determines how many new Pods to create, maxUnavailable determines how many old Pods to kill. In this case, we can only kill 1 old Pod at a time. This ensures the capacity is always at least 2 - 1 Pods.

Like maxSurge, maxUnavailable also supports percentage values:

maxUnavailable = 25%, replicas = 8Step 0:   [Pod 1.0 x8]
Step 1: [Pod 1.0 x6]
Step 2: [Pod 2.0 x2, Pod 1.0 x6]
...
Step n+1: [Pod 2.0 x8]

There must always be at least (100% - 25%) * 8 running Pods, so this rolling update starts by killing 25% * 8 old Pods. As the new Pods become Ready, more old Pods can be killed and more new Pods can be created.

Combining Max Surge and Max Unavailable

So far, my examples have focused on either maxSurge or maxUnavailable, but these two configuration fields can be used together. In fact, the default configuration is this:

maxSurge: 25%
maxUnavailable: 25%

There can be up to 25% overhead and 25% loss of capacity during the default rolling update. When a rolling update begins, it can immediately create 25% of the new Pods and kill 25% of the old Pods.

What happens if you set both fields to 0?

maxSurge: 0
maxUnavailable: 0

The rolling update can’t create any new Pods because no overhead is allowed. It also can’t kill any old Pods because no loss of capacity is allowed. The rolling update cannot proceed. It’s an invalid configuration.

Conclusion

If you’re using Kubernetes Deployments’ Rolling Update strategy, it’s important to tune maxSurge and maxUnavailable. On the one hand, the rolling update process waits for newly-created Pods to become Ready, so larger values of maxSurge or maxUnavailable mean larger batches and less waiting. On the other hand, lower values mean less impact on resource consumption and work capacity. Choose what works best for your specific application.

Koki

A complete platform for running applications on Kubernetes

Kynan Rilee

Written by

Koki

Koki

A complete platform for running applications on Kubernetes