Beyond “Hello World”: Modern, Asynchronous Python in Kubernetes

Deploying Scalable, Production-Ready Web-Services in Python 3 on Kubernetes

Sean Stewart
Jul 22, 2019 · 7 min read
Image for post
Image for post

Assumptions

This post assumes:

  1. You are looking at Kubernetes for deploying your service.

It’s all about Scaling

When we talk about scaling, we generally refer to one of two major approaches:

  1. Vertical Scaling — scaling up on the resources of a given machine.

What We’re Testing

More traditional deployments require a mix of vertical and horizontal scaling, with an emphasis on vertical — by way of maximizing the use of available CPU cores on your machine. For Python web-services, that usually means running your application behind Gunicorn or another similar solution in production. I agree that for these environments, this is definitely the appropriate strategy.

Application Implementation & Design

I implemented a simple, RESTful API supporting GET/PUT/POST/DELETE using the following libraries:

  1. Database: PostgreSQL
  2. DB Client: asyncpg
  1. cchardet (via aiohttp[fast])
  2. uvloop

Application Runtime

Now that we’ve got our application, it’s time to figure out how to run it in production. For the purpose of this post, I set up two application entry-points:

  1. via Gunicorn, by calling gunicorn --config=guniconfig app_wsgi:app
  • Gunicorn was also configured with a max worker lifetime of 1000 requests, to combat the well-documented memory leak issues that can occur with long-lived workers.

Application Deployment

Both applications were deployed using ankh behind an Nginx Ingress, with identical Service definitions, and the following resource profiles:

replicas: 10
limits:
cpu: 1
memory: 512Mi
requests:
cpu: .1
memory: 256Mi

By The Numbers

Application Performance

All benchmarks below were run using hey, set to 200 concurrent connections hammering our servers for 30s. There was no rate-limiting implemented, as our goal was to determine deployment performance under high-stress and full resource utilization.

https://plot.ly/~seandstewart/6
https://plot.ly/~seandstewart/6
Requests Per Second — Head to Head
https://plot.ly/~seandstewart/8
https://plot.ly/~seandstewart/8
Response Time Distribution within 99.9% — GET — 1ms Buckets
https://plot.ly/~seandstewart/16
https://plot.ly/~seandstewart/16
Response Time Distribution within 99.9% — POST — 1ms Buckets
https://plot.ly/~seandstewart/14
https://plot.ly/~seandstewart/14
Response Time Distribution within 99.9% — PUT — 1ms Buckets
https://plot.ly/~seandstewart/22
https://plot.ly/~seandstewart/22
Head-to-Head Distribution, All Quantiles. Click through to play around!

Resource Utilization

For the bare aiohttp deployment, the replica set ran at ~1.15Gi Memory and <.01 CPU overall (~115Mi Memory and ~0 CPU per pod). While under load, the CPU limit of 7 was utilized between 90–100% (around 90% for the GET test, 100% for the PUT), but memory usage never grew beyond 1.5Gi, well under our 5Gi limit.

Initial Assessment

All-in-all, the performance of the two deployments is nearly identical, and the slight service degradation introduced with Gunicorn isn’t necessarily a deal-breaker, depending upon the SLAs your particular application must meet. However, if Gunicorn is, in fact, hampering the performance and reliability of your application in this deployment architecture, should it be used at all?

Additional Benchmarks

With all this data under my belt, I decided to see if I could test a more “standard” Gunicorn-style deployment in order to take advantage of Gunicorn’s ability to scale vertically, following the age-old rule-of-thumb mentioned in the Gunicorn documentation.

replicas: 2
limits:
cpu: 5
memory: 3Gi
requests:
cpu: 5
memory: 2Gi

Application Performance

Here are the charts we saw above, with this deployment in the mix…

https://plot.ly/~seandstewart/33
https://plot.ly/~seandstewart/33
Response Time Distributions within 99.9% — GET — 1 ms Buckets
https://plot.ly/~seandstewart/31
https://plot.ly/~seandstewart/31
Response Time Distributions within 99.9% — POST — 1 ms Buckets
https://plot.ly/~seandstewart/29
https://plot.ly/~seandstewart/29
Response Time Distributions within 99.9% — PUT — 1 ms Buckets
https://plot.ly/~seandstewart/27
https://plot.ly/~seandstewart/27
Head-to-Head Distribution, All Quantiles. Click through to play around!

Final Assessment

While no application is the same, I believe that the data above shows the fallacy of assuming a deployment strategy based upon historical solutions. While Gunicorn didn’t necessarily hamper the performance of our application if deployed correctly, its usage came at the cost of:

  1. Yet another layer to learn and debug — and to ensure your co-workers are familiar with as well.
  2. At least ~43% more CPU and 2⅓x more Memory if not configured properly, and about ~20% more Memory if done correctly.

Xandr-Tech

Our latest thoughts, challenges, triumphs, try-again’s…

Thanks to Ahmed Abdalla and Shreyas Prasad

Sean Stewart

Written by

New York based Software Engineer and Fiction writer. https://seandstewart.io

Xandr-Tech

Our latest thoughts, challenges, triumphs, try-again’s, most snarky and profound commit messages. Our proudest achievements, deepest darkest technical debt regrets (just kidding, maybe). All the humbling yet informative things you learn when you try to do things with computers.

Sean Stewart

Written by

New York based Software Engineer and Fiction writer. https://seandstewart.io

Xandr-Tech

Our latest thoughts, challenges, triumphs, try-again’s, most snarky and profound commit messages. Our proudest achievements, deepest darkest technical debt regrets (just kidding, maybe). All the humbling yet informative things you learn when you try to do things with computers.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store