KUBERNETES HOW-TO

Zero-Downtime Deployment in Kubernetes

How to Implement Rolling Deployment Easily

Giovanni Dejan

Published in

Dev Genius

7 min readMay 28, 2022

Deployment illustration with rocket launch. — Deployment illustration by SpaceX on Unsplash

Aiming for zero downtime migrations should be the norm, not the exception at most places.
- Gergely Orosz

Zero-downtime deployment is a dream for many engineering teams. The “magic” where you can deliver value without notifying your users about “maintenance period” is a deal-breaker in this era of hyper-growth.

With that being said, it’s not always easy to achieve zero-downtime deployment. However, fear not, because Kubernetes comes to the rescue. In this post, I will guide you through creating a zero-downtime deployment in Kubernetes cluster using rolling method.

NOTE: while we are going to use rolling method, it’s not the only way. You can read more about deployment strategies here.

NOTE: I’m not going to explain about Kubernetes in this post. You can read basic explanation about Kubernetes objects here, or you can check Kubernetes tutorials on YouTube as well.

Create the App and Image

The endpoint will be a very simple one, which is only to return the current version.

NOTE: if we want to test the rolling deployment, we need to replace the response before we build 2nd image.

As for 15 seconds of sleep, I want to simulate the delay when starting a new pod in Kubernetes. Fifteen seconds simulate when the actual server connects itself to database and other external dependencies. The rolling should work, even without sleep time.

For Docker image, I just use the Dockerfile from FastAPI website. Here’s my image file:

Now that we have a simple app and Dockerfile, the next thing I want to do is to then test the rolling strategy.

Setup the Cluster

Although I’m using Google Cloud Provider (GCP), any cloud provider with Kubernetes feature can also be used. I choose GCP because I still have a free plan.

If you want to use US-based region, you can read this guide on how to provision Kubernetes cluster using Google Kubernetes Engine. However, if you want to use that in other region, I’m not sure if it works. Based on my experience, there’s an error related to service account that occurs in Jakarta (but not in US-based region). So, we have to provision GKE cluster ourselves, without using module from guide above.

Networking Configuration

In order to provision Kubernetes cluster by ourselves, we need to create VPC and subnet first. This is the configuration that I use:

NOTE: Original code can be found here.

GKE Cluster Provisioning

For GKE cluster provisioning, we use standard google_container_cluster resource. However, based on a tip from HashiCorp Learn, we need to “abandon” default node pool after we provision our own node pool(s). In fact, code below is a modification from the mentioned GitHub repo in that article.

Now you see (in above code) why we need to “manually” create our own subnets. These subnets will be used by GKE cluster.

NOTE: One crucial difference between article from HashiCorp Learn and my code is that I add ip_allocation_policy in google_container_cluster resource. If we didn’t add that configuration, everytime we apply the Terraform configuration, we are “forced” to recreate the cluster, which is not what we want.

Create Deployment and Service

In order to test our app, we need to create Deployment and Service , so that we can access the endpoint through the internet.

This is the Kubernetes object file that I use to create deployment and service:

NOTE: I have yet to add rolling configuration. That will be done in next subsection.

As you can see, I create one Deployment and then expose it with one LoadBalancer service. This allows us to access the container from the internet.

In my configuration file, I add health check configuration for my pods. The reason is I want Kubernetes to know what is the health indicator, so if there’s any of the pods isn’t “healthy”, Kubernetes will automatically delete the pod and recreate new one.

Also, if you notice, I create these Kubernetes objects with Terraform, instead of YAML. The reason is so that the rolling deployment can be triggered automatically from terraform apply command.

One reason why I didn’t use Google Container Registry (GCR) for Kubernetes objects configuration (and in CI/CD pipeline which you will see later) is that Kubernetes requires extra effort to pull images from GCR. You can see the guide here on using GCR images in Kubernetes.

Configure Rolling

Now, onto the rolling strategy.

By default, Kubernetes deployment in Terraform already includes rolling. But, if you’re not sure or you want to customize the behavior, you can customize the rolling mode.

NOTE: the final source code can be found here.

Now you may wonder, what is the meaning of max_surge and max_unavailable. Let me try to briefly explain this. The rolling deployment means there will be one / more additional pod(s) during deployment, and after the new pod(s) is / are considered healthy, the older pods will be deleted. max_surge states how many new pod(s) should be provisioned before the rolling starts, while max_unavailable, it’s a number of how many pods are unavailable during pod termination part.

How to Trigger Rolling Deployment

In the previous section, I said that the reason to write Kubernetes configuration file in Terraform is for practicality (using terraform apply to trigger rolling deployment). But, how does it work?

According to Kubernetes Patterns book, in order to trigger update for Deployment object, there are three options:

Replace the whole Deployment with the new version with kubectl replace.
Patch the deployment using kubectl patch.
Change the pod’s image with kubectl set image.

My guess is that Terraform uses either 2nd or 3rd option.

Configure CI/CD

GitHub Actions is a free-to-use CI/CD pipeline offered by GitHub for open source repositories. In this post, I use GitHub Actions to execute terraform apply command.

NOTE: I will not discuss GitHub Actions in-depth in this post. You can learn it from GitHub website here.

NOTE: the source code can be found here.

The logic is simple. We clone the repository (although I don’t understand why the action uses the checkout term), and then we validate the code with mypy and pycodestyle. After that, we setup gcloud command which is required by Terraform to access and modify GCP resources. After that, we run terraform apply command which will trigger the rolling deployment.

Show Time

After provisioning the cluster and Kubernetes objects, now it’s time to test the app. Let’s test the existing app first by accessing the endpoint, either through browser, terminal (cURL or other CLI apps), or GUI apps for HTTP endpoints testing (e.g. Postman).

In order to access the endpoint, follow these steps:

Run kubectl get services. You will get the IP Address at EXTERNAL-IP column.
Access the endpoint by using URL with this format: ${external ip}/version. You will wait for 15 seconds, then you will get response like below image.

Now, let’s update the code and push into GitHub to see if the rolling deployment works. This is my change:

Changing the version returned in endpoint.

If you watch the pods using watch “kubectl get pods — field-selector=status.phase=Running”, you will see something similar like these screenshots:

(1) One pod is terminating. — (1) First old pod is terminating.

(2) One new pod is creating, while 2nd pod is terminating. — (2) One new pod is already running, while 2nd pod is terminating.

(3) 2nd and 3rd pod is terminating, while 2nd new pod is running. — (3) 2nd and 3rd old pods are terminating, while 2nd new pod is running.

(4) Old pods are terminated and replaced with new pods.

Now, to test if the deployment is successful or not, access the same endpoint. You will get different output. Here’s my updated output:

Well, it works! Wonderful, right? You can see the full code here.

NOTE: I realized that I should have tested the endpoint during rolling process, since that’s the benefit of the rolling update. However, I think by monitoring the pods will be enough to prove that the rolling update was a success. Also, during the practice, it’s quite impossible for me to do that since I need to screenshot the rolling process while testing the endpoint, so I choose to screenshot instead.

One Tip for Production

One tip for me when you want to do this in production is that you need to separate the Terraform code for Kubernetes cluster provisioning with Kubernetes configuration files for objects (which contain Deployment and Service). The reason is that we want to limit the access to Kubernetes cluster configuration for security purposes. Developers instead will interact directly with K8s objects.

I hope you understand through this guide on how to setup rolling deployment in Kubernetes. See you later!