Kubernetes: Nginx and Zero Downtime in Production
When running an application in Kubernetes, it is important that the ingress controllers are able to run with zero downtime, especially in production. Otherwise, users will experience degraded service when controllers are restarting or redeploying, or when rolling node servers.
This article will explain how to set up an Nginx ingress controller in Kubernetes, with zero downtime.
The Problem
The official Nginx Ingress controller, by default, has interruptions in service when its pods are restarted or redeployed. This can result in dropped connections if the ingress pods are terminated for any reason. In production environments especially this is unacceptable. We need to have an nginx ingress controller that is always available to serve traffic.
Kubernetes and SIGTERM
When terminating a pod, Kubernetes will send a SIGTERM
signal to the main process and wait for some predetermined amount of time for the pod to terminate (default is 30s). After that time has passed, Kubernetes will send SIGKILL
, which causes the process to terminate immediately.
Documentation on pod termination can be found here.
Kubernetes expects pods to handle SIGTERM
with a graceful shutdown. In reality, not all pods honor this expectation.
Nginx Signals: SIGTERM vs SIGQUIT
Nginx handles signals a little bit differently than Kubernetes expects. According to the Nginx documentation on signal handling, it will handle TERM
and QUIT
as follows:
Nginx Signals
+-----------+--------------------+
| signal | response |
+-----------+--------------------+
| TERM, INT | fast shutdown |
| QUIT | graceful shutdown |
+-----------+--------------------+
So, when Kubernetes sends a SIGTERM
to the nginx-ingress-controller pod, Nginx will enact a fast shutdown. If the controller is processing any requests during this time, they will be interrupted, resulting in dropped connections. This can cause a cascade of failures, resulting in a rise in HTTP 50x responses during pod termination.
Instead we want to give Nginx a SIGQUIT
signal, which will prompt a graceful shutdown. Nginx will wait for connections to terminate before exiting, while not accepting any new connections.
The Solution: Lifecycle PreStop Hook
Kubernetes offers a preStop
lifecycle hook for a pod spec. During termination, Kubernetes will call preStop
after signaling that the pod is Terminating
but before sending the pod a SIGTERM
signal.
The preStop
hook allows the pod to call any command before the termination signal. For this case, we want to pre-emptively send Nginx the SIGQUIT
signal, so that the controller will gracefully terminate connections.
More info on preStop
can be found here.
How to Send Nginx SIGQUIT
The nginx-ingress-controller image,
quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.24.1
includes a command for sending Nginx termination signals.
Running the following script in this image:
/usr/local/openresty/nginx/sbin/nginx -c /etc/nginx/nginx.conf -s quit
while pgrep -x nginx; do
sleep 1
done
will send Nginx a SIGQUIT
signal and wait for the process to terminate.
Note that this script by itself can possibly run indefinitely, if the nginx process never quits. More info on how to add a timeout to this later.
Defining the preStop hook
We can put the above script into a one-line command, and add this to the lifecycle
section of a pod spec, as follows:
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 5; /usr/local/openresty/nginx/sbin/nginx -c /etc/nginx/nginx.conf -s quit; while pgrep -x nginx; do sleep 1; done"]
Note that there is a sleep 5
command before the actual script. This will wait for any kubernetes-related race conditions to pass before initiating a graceful shutdown. As of this writing it is unclear why this is necessary, but during testing nginx would still drop connections unless this sleep was enacted.
Defining terminationGracePeriodSeconds
A preStop
hook will certainly give us the QUIT
signal needed to shut down nginx gracefully. However, by default Kubernetes has given only 30 seconds to go through the entire Termination
process before enacting SIGKILL
. This means both the preStop
hook and Kubernetes' SIGTERM
signal need to complete before 30s is up, or Nginx will still suffer from ungraceful shutdown and dropped connections.
To mitigate this, we can increase the grace period for Kubernetes to terminate the way we want. This is defined in the Pod spec, as follows:
spec:
terminationGracePeriodSeconds: 600
Note that if 30 seconds is enough time for your Nginx ingress setup, you may skip this part.
More information on termination grace period and hook handler execution can be found here.
Deploying the Fix with Zero Downtime
Just because we have solved the zero-downtime problem, does not mean we will have a zero-downtime deploy if we update the nginx Deployment. Remember that the old configuration still has the problem. The solution here is to have TWO load balancers and TWO nginx ingress controllers, and switch traffic between the two. Direct traffic to the temporary load balancer, update the original while there’s no traffic flowing through it, and then direct traffic back to the original ingress controller.
Step 1: Create a Second Ingress Controller
We deploy nginx-ingress-controller via helm:
helm upgrade --install nginx-ingress stable/nginx-ingress --namespace ingress -f nginx/values.yaml
This will create an nginx-ingress-controller in the ingress
namespace. We have defined additional configuration in a values.yaml
file.
We will need to install a second controller, which we can also do via helm (created a new namespace called ingress-temp
):
helm upgrade --install nginx-ingress-temp stable/nginx-ingress --namespace ingress-temp -f nginx/values.yaml
NOTE: If you are running Helm 3 you will need to first make a namespace via kubectl
:
kubectl create namespace ingress-temp
Now that we have two ingress controllers, it is time to direct traffic to the temporary controller so that we can upgrade the original without interruptions in traffic.
Step 2: Redirect Traffic to Temporary Controller
Both Nginx controllers are available to direct traffic to services, but your DNS provider is configured to talk to only one of the Load Balancers. Change the DNS definition for your websites to point at the load balancer created from nginx-ingress-temp
. How to do this varies depending on your DNS provider.
Monitor Traffic Flow
It’s useful to log traffic from both controllers during this changeover. We can do this with kubectl
in two separate terminal windows:
Before flipping the DNS switch, follow traffic logs from the original controller (ingress
namespace):
kubectl logs -lcomponent=controller -ningress -f
You should see all traffic flowing through this controller. This is the one we will take offline and update with a preStop
hook.
In a separate window, follow traffic logs from the temporary controller created in Step 1:
kubectl logs -lcomponent=controller -ningress-temp -f
Other than nginx starting itself, you should see no traffic logs from this controller.
Keep these windows open. We will monitor the change in traffic flow as we switch the DNS.
Change DNS to Temporary Controller
Now it is time to change traffic to the temporarily created Load Balancer. Kubernetes will automatically create and configure a Load Balancer when helm creates a new LoadBalancer
service.
More about Kubernetes automatic creation of Load Balancers can be found here.
To obtain the External IP of the newly created Load balancer, run the following kubectl
command:
kubectl get service -ningress-temp
Look for the LoadBalancer
type service, and get the External-IP
. For AWS, for example, this may look like:
EXTERNAL-IP
xxxxxxxxxxxxxxxxxxxxxx-xxxxxxxxxx.us-west-1.elb.amazonaws.com
Update your DNS provider to point your site to this ExternalIP. Depending on your settings, this change may take a while to propagate.
Monitor Change in Traffic Flow
Look back at your terminal windows monitoring traffic flow through the Nginx controllers. Notice that the logs for the original controller are becoming less frequent, and the temporary controller is picking up traffic.
Wait for traffic to fully drain from the original controller. Once this is completed, move on to Step 3.
Step 3: Redeploy Original Controller with PreStop Update
Once traffic has been drained from the original nginx-ingress-controller, it is safe to update the original nginx-ingress-controller deployment with the preStop
hook.
Add the following configuration to your nginx values.yaml
:
controller:
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 5; /usr/local/openresty/nginx/sbin/nginx -c /etc/nginx/nginx.conf -s quit; while pgrep -x nginx; do sleep 1; done"]
terminationGracePeriodSeconds: 600
Note again that terminationGracePeriodSeconds
is OPTIONAL and the number of seconds is configurable based on your controller's needs.
Redeploy the original controller with helm:
helm upgrade --install nginx-ingress stable/nginx-ingress --namespace ingress --version 1.6.16 -f nginx/values.yaml
Ensure the controller is deployed with the preStop
hook applied:
kubectl get deployment nginx-ingress-controller -ningress -oyaml
The original controller is ready to accept traffic again.
Step 4: Direct Traffic Back to Original Controller
To obtain the External IP of the original Load Balancer, run the following kubectl
command:
kubectl get service -ningress-temp
Update your DNS to direct back to the original Load Balancer. Once again these changes may take a while to propagate, depending on your settings.
Monitor Traffic Flow Back to Original Controller
Go back to your terminal windows in which you are monitoring traffic for the two controllers. If the kubectl logs
command timed out, you may have to invoke the command again.
If the above steps have gone according to plan, you should see traffic dropping in the temporary controller and picking back up in the original controller.
Once again, once traffic has been fully drained, proceed to the final step.
Step 5: Remove Extra Infrastructure
Now that traffic is flowing through the original controller, we no longer need the temporary one.
Use helm to delete the temporary controller, as follows:
helm delete --purge nginx-ingress-temp --namespace ingress-temp
Again, if running Helm 3, delete the temporary namespace via kubectl
:
kubectl delete namespace ingress-temp
Note that removal of the extra controller should be done in a week or so depending on how long the apps cache and keep connections alive.
Step 6: Celebrate!
Congratulations, you have deployed a zero-downtime nginx ingress controller! Now is the time to pat yourself on the back for a job well done.
Future Nginx Deployments
After following the above steps, you should have the ability to deploy future changes normally, with a simple helm update
command. Nginx will handle traffic gracefully, and new controllers will come up without any interruptions in traffic.
However, if you are making risky changes to your ingress controller or prefer an abundance of caution when making any production-level changes, this is a great method for updating your infrastructure. This method allows for easy switching between a known good controller and one with untested changes, such as switching between versions or sources. Simply switch the DNS to the changed controller, monitor logs for unwanted behavior, and switch back quickly if breaking changes are detected.
Lindsay Landry, DevOps Engineer at Codecademy