“Gracefully” Implementing Graceful Shutdowns

Jainal Gosaliya
7 min readApr 10, 2024

--

This is a followup article to my previous article on implementing an end-to-end graceful shutdown system. In my previous blog, I explained concepts behind the working of a system where graceful shutdown is implemented and also other details on describing the components of graceful shutdowns with Kubernetes. If you want to take a look at that article you can read that article below.

However, this article is independent and if you know what a graceful shutdown is and are just looking for ways to implement it, this blog is all you need. We will directly, dive deep into the code & implementation part and not discuss the boring theory behind it.

"Talk is cheap. Show me the code." 
- Linus Torvalds

In a hurry 🏃♀️ and want to just get the code? 🔽:

Overview

What are we trying to achieve?

Imagine a backend system where users can submit tasks such as orders, AI tasks, or any long-running tasks through a rest API. Whenever a task is created it is enqueued to a queue and is picked up by the worker when it is free to pick a task. Once, a user’s task is completed they are notified through push notification using web sockets if they are online and are still connected to the backend systems.

This seems like a reasonable assumption for lots of services and use cases out there in which such implementation running in production.

Now, imagine finding a huge vulnerability with P1 status or you just want to add some new awesome features and deploy a feature update.

The question always remains on the strategy of deploying this in such a way that the user’s submitted tasks are not lost and they are notified before the new version is deployed while also ensuring no downtime,

Now, that you got an idea of the problem in hand, let’s see what the implementation looks like.

Overview

Tech Stack

Programming Language — Python

Backend Framework — Fast API web server

Deployment — Kubernetes

However, I believe the language and frameworks are not important here, it just the way it is built and handled is that matters.

Architecture

As seen in the above image we have a FastAPI server, deployed in a pod of Kubernetes with min-replicas set to 1 for simplicity. It has a rest API to send tasks, a WebSocket interface for sending task status events to the clients, and a signal listener that listens to pod termination signals sent by Kubernetes.

Through the rest API, a task is created and enqueued to the inbuilt FastApi background queue and is later processed by the worker. Once, the task is processed the client is notified via WebSocket.

The custom-implemented signal listener listens to all the SIGTERM signals send by the Kubernetes controller when a pod termination happens. This is the watcher that steers the entire graceful shutdown part.

Implementing

First, let’s check out the custom signal handler. When you google FastAPI os signal or event listener it takes you to this doc: Lifespan Events. Our, goal is to intercept a SIGTERM signal sent to our fast API server, if you use the Lifespan events it will first stop the server and execute your task binded to the event. This is bad for us as it will terminate the WebSocket client connections.

So, inorder to just intercept we can just register a signal register, that will register a SIGTERM signal and pass a handler that can be triggered when that SIGTERM signal is sent.

As seen in the above code, when we invoke the register signal handler for SIGTERM and SIGINT. And in the handle exit, we will wait until all the remaining tasks in the background queue are complete, and then only, we will terminate our WebSocket connection and imitate a server shutdown.

This signal handler registered on server startup through the @app.on_event(“startup”)

@app.on_event("startup")
def chain_signals():
signal_handler.register_signal_handler()

Believe me, this is all you need to do! I mean, sure you need some more tiny other stuff like the Kubernetes config etc, but this is the only main part.

Read Flag and the Health API

We have readFlag which is a boolean value and is set to true by default. When your signal handler intercepts the termination signal, the readFlag is set to false.

When anyone calls the health check endpoint, based on the readFlag value the status code 200 or 503 is sent.

In Kubernetes the readinessProbe will call the health check endpoint defined in the path and will check the status code as 200. Once, the application sends a non-200 signal, Kubernetes considers the probe as failed depending on the failureThreshold setting in the probe configuration. it will wait n times in a row, before considering the probe to have failed.

Once, it is determined as the probe to be failed, no new connections will be sent to this pod.

This is important because, we want the existing clients to be connected until their published tasks are finished but, we also don’t want new clients to be connected to this terminating pod and post new tasks

Also, in your application logic, always check the flag value before accepting a task or a connection, to ensure any race condition between Kubernetes pod termination and task creation. Like you see in the below snippets, the create_task and connect only allow if the readFlag is set to true.

Kubernetes Config

This is just your normal-looking Kube manifest file. The only important part here is the readinessProbe as mentioned before and the terminationGracePeriodSeconds which is the max seconds to wait for a graceful shutdown before killing it forcefully.

Watching it all in action

So, now that we understand the bits and pieces of the system, it’s time to look at the big picture, watching the entire system handle graceful shutdowns.

Demo 1

As you can see in the above image, first we publish some tasks to the service. Through another API endpoint, we can check the number of tasks in the queue.

As soon as we send the kubectl delete pod {pod_name} the SIGTERM is sent to the fast-api application and it prevents instant termination and waits for all tasks to be processed and sent to the web-socket clients.

Demo 2

In this above demo, we can see that 2 clients are connected to the pod and there are some tasks enqueued. As soon as we terminate the pod and we try to connect a new client, it gets connected to a new pod and not the previous pod as the read flag will start sending non 200 status code responses for the health check route. But, our initial clients are still connected and are kept connected until all the responses are delivered.

Conclusion

This summarizes what we want to achieve with a graceful shutdown in a Kubernetes environment. By implementing a custom signal handler in our FastAPI backend and leveraging Kubernetes’ readiness probes and terminationGracePeriodSeconds, we’ve ensured that:

1. User-submitted tasks are not lost during deployment or pod termination.

2. Existing clients remain connected until their tasks are completed, preventing new connections to a terminating pod.

3. The system provides feedback to clients about the status of their tasks through WebSocket notifications.

4. Race conditions between Kubernetes pod termination and task creation are mitigated using a readFlag mechanism.

5. The application gracefully shuts down by waiting for all pending tasks to be completed before terminating.

This approach ensures high availability, maintains user experience, and minimizes downtime during deployments or system updates. Remember, the key is not just in knowing what a graceful shutdown is, but in effectively implementing it to handle real-world scenarios seamlessly.

Ready to level up 🚀 your backend engineering game? Subscribe to Scale Bites, your ultimate guide to mastering the art of scalability. Dive into concise, actionable insights delivered straight to your inbox, tackling the toughest challenges in system design. Whether you’re a seasoned pro 👨💻 or just starting out 🎓, Scale Bites has something for everyone. Join the community 👫 on LinkedIn and embark on a journey to build robust, future-proof systems. Don’t miss out — subscribe now and scale new heights, one bite at a time.

#GracefulShutdown #SystemDesign #Microservices #DevOps #DistributedSystems #BackendEngineering #Kubernetes #Containerization #Scalability #TechBlog #SoftwareEngineering #SystemReliability #Coding #Programming #DataIntegrity #TechTips #ITInfrastructure #SoftwareDevelopment #CloudComputing #ContainerOrchestration #ContinuousIntegration #ScaleBites #Newsletter

--

--

No responses yet