Beyond “Hello World”: Modern, Asynchronous Python in Kubernetes
Deploying Scalable, Production-Ready Web-Services in Python 3 on Kubernetes
Python has undergone something of an evolution in the past few years. From Python 3.4 to 3.7 we have seen the introduction of
asyncio, the introduction and formalization of
async/await keywords, and re-investment in
asyncio performance. Writing asynchronous code in Python has never been easier, more performant, or more efficient.
In addition to the improvements to the stdlib, Python’s Open-Source community has entered something of a Renaissance as well. The Open-Source community has embraced the potential of async/await and the flexibility of the
asyncio library has proven a huge boon.
asyncio’s extensible API readily encourages alternative Event Loop implementations and we now have libraries like
uvloop, which is an
asyncio compatible implementation of the Event Loop using
libuv under the hood. Additionally, when it comes to web frameworks, there couldn’t be more options to choose from, and there are decisive benchmarks out there pitching all of them head-to-head.
However, when it comes to building a new application, there is decidedly little chatter about what these benchmarks mean for you in the context of how your application will be deployed. Will it be deployed via a cloud-based Virtual Machine? Directly to a server? What about Kubernetes or Docker Swarm?
At Xandr, we’ve gone all-in on Kubernetes. If you’re reading this post, odds are you’re in a similar boat. With that in mind, this post investigates 3 different deployment configurations of an otherwise identical application in an attempt to determine the ideal deployment configuration for your asynchronous web service.
This post assumes:
- You’ve already made the (wise) decision to use an async framework for your web service
- You are looking at Kubernetes for deploying your service.
For the purposes of this post, I chose the aiohttp framework for its maturity and stability, but the general rules provided here should be applicable to any open-source framework on the market today.
It’s all about Scaling
When we talk about scaling, we generally refer to one of two major approaches:
- Horizontal Scaling — scaling across machines and/or environments
- Vertical Scaling — scaling up on the resources of a given machine.
What We’re Testing
More traditional deployments require a mix of vertical and horizontal scaling, with an emphasis on vertical — by way of maximizing the use of available CPU cores on your machine. For Python web-services, that usually means running your application behind Gunicorn or another similar solution in production. I agree that for these environments, this is definitely the appropriate strategy.
When your application is deployed on Kubernetes, it runs in the foreground on small Docker containers scheduled in Pods with a fraction of a CPU and minimal memory. Kubernetes takes advantage of horizontal scaling. Rather than ramp up the number of threads or worker processes on a single Pod, you scale the number of Pods to meet demand.
If done properly, developing and deploying with Docker can provide us with a very powerful guarantee:
- The application run-time you develop and debug with is the application run-time you deploy.
With this in mind, I set out to determine the following question:
- In the context of Kubernetes, is the addition of a run-time dependency for deployment worth the additional overhead and/or risk?
Application Implementation & Design
I implemented a simple, RESTful API supporting GET/PUT/POST/DELETE using the following libraries:
Additionally, I installed the following libraries to improve overall performance:
cchardet are used automatically by
aiohttp if they're available, so are a no-op.
uvloop can be invoked by running
uvloop.install() at the top of your
app.py (or under your
ifmain if you prefer). Be sure you're not creating a global for your loop before doing this (or ever, really 😄)!
Now that we’ve got our application, it’s time to figure out how to run it in production. For the purpose of this post, I set up two application entry-points:
- Directly, by calling
- via Gunicorn, by calling
gunicorn --config=guniconfig app_wsgi:app
- Gunicorn was configured to use a single
- Gunicorn was also configured with a max worker lifetime of 1000 requests, to combat the well-documented memory leak issues that can occur with long-lived workers.
The application itself is built on a Docker image using a multi-stage build with a Python/Alpine-Linux base image to ensure the image is as small as possible.
It should be noted that aiohttp mentions in its documentation that running an aiohttp server behind Gunicorn will result in slower performance.
Both applications were deployed using ankh behind an Nginx Ingress, with identical Service definitions, and the following resource profiles:
By The Numbers
With my applications deployed and bugs squashed, it’s now time to get a feel for how these two services will run.
All benchmarks below were run using hey, set to 200 concurrent connections hammering our servers for 30s. There was no rate-limiting implemented, as our goal was to determine deployment performance under high-stress and full resource utilization.
We set the following SLAs for our servers:
1. GET: 99.9% under 100 ms
2. POST: 99.9% under 150 ms
3. PUT: 99.9% under 200 ms
For the bare
aiohttp deployment, the replica set ran at ~1.15Gi Memory and <.01 CPU overall (~115Mi Memory and ~0 CPU per pod). While under load, the CPU limit of 7 was utilized between 90–100% (around 90% for the GET test, 100% for the PUT), but memory usage never grew beyond 1.5Gi, well under our 5Gi limit.
The Gunicorn deployment consistently used about 30% more memory and CPU utilization was marginally higher, about 95%-105%*.
*Kubernetes enforces CPU limits with throttling, not by killing your container, as with memory limits. This means that you may see occasional spikes slightly above your configured limit. I found this article helpful in understanding this mechanism.
All-in-all, the performance of the two deployments is nearly identical, and the slight service degradation introduced with Gunicorn isn’t necessarily a deal-breaker, depending upon the SLAs your particular application must meet. However, if Gunicorn is, in fact, hampering the performance and reliability of your application in this deployment architecture, should it be used at all?
With all this data under my belt, I decided to see if I could test a more “standard” Gunicorn-style deployment in order to take advantage of Gunicorn’s ability to scale vertically, following the age-old rule-of-thumb mentioned in the Gunicorn documentation.
I landed on the following the resource profile for the Gunicorn deployment:
With 11 workers per Pod, giving us a total limit of 10 CPU, 6Gi Memory, and 22 workers for the replica set.
Here are the charts we saw above, with this deployment in the mix…
With a total of 22 workers over 2 Pods in the Replica Set, this deployment maxed out its 10 CPU limit and consistently ran at ~3.5Gi memory. Thats ~43% more CPU and 2⅓x more memory.
Not only that, this deployment couldn’t even touch the previous two in terms of performance and reliability, and was far outside our SLAs for all operations. One could argue that scaling up on each Pod or scaling out the Replica Set would improve this, and they’d be correct. However, at this point we’re already using a significantly higher number of resources to achieve a sub-par result and scaling up or out to meet the performance of the alternative deployments goes against the core mindset of Kubernetes deployments: small, lightweight containers which can scale out on-demand.
While no application is the same, I believe that the data above shows the fallacy of assuming a deployment strategy based upon historical solutions. While Gunicorn didn’t necessarily hamper the performance of our application if deployed correctly, its usage came at the cost of:
- An additional dependency that changes the run-time of your application in production vs your run-time in development.
- Yet another layer to learn and debug — and to ensure your co-workers are familiar with as well.
- At least ~43% more CPU and 2⅓x more Memory if not configured properly, and about ~20% more Memory if done correctly.
My recommendation (if you haven’t guessed it already) is to forego this production dependency altogether. Deploying a web service on Kubernetes behind Gunicorn provides no additional benefit in regards to performance or stability, at the cost of greater resource needs.