Long computations over REST HTTP in Python

Let’s say we’re using gunicorn with Flask in our project Python. Consider the following problem…

We have a microservice which exposes REST API that processes some images. One endpoint is responsible for image resizing. It can take more than a minute. How to design and implement a solution that works and makes system robust?

By default gunicorn spawns synchronous workers which are bound to processes. When you make an HTTP request it is queued by gunicorn, waiting to be consumed by one of the workers, make computation and return results. They recommend spawning 2–4 workers per core in the server.

Where’s the catch?

Most of your requests are going to time out. Here’s why.

Standard approach

Here’s a short excerpt from our app.Code below has function called test which represents expensive computation (just for the sake of the argument — it is represented by sleep method). On top of that I’ve written a simple decorator function measure_processing_time which measures time in seconds between start and end of the request function.

def resize_image(seconds=3): 
def hello_world():
return 'Task successfully run'

Let’s run it with gunicorn.

The results are pretty disturbing.

Results for default worker

Quick look at timestamps at which requests were processed (left hand side brackets) and we can see that processing of the last request ended 20 seconds after the first request was finished. Despite the fact that the computation takes only 5 seconds, user had to wait 25 seconds to see the results.

But I can always increase the number of workers!

Okay, so let’s say your server has 16 cores and 64GB of RAM. If you were to host this on Amazon EC2 you’re going to pay 553$/mo for up to 64 workers gunicorn would recommend in this situation. It roughly translates to 64 concurrent computation, for anything above that the client has to wait. Moreover, it makes you exposed to DoS attacks, this is why it is always recommended to have buffering proxy (like nginx or any other load balancer) in front of gunicorn when using its default, synchronous workers.

But using AWS Elastic Load Balancing and AWS Auto Scaling I can easily scale my number of servers!

My answer to that is, are you willing to to pay 6000$ per month for 700 concurrent requests? If that’s the case, stop reading this article and have fun on Amazon! Otherwise, keep reading how to better utilize precious system resources.

Asynchronous approach

We could’ve foreseen the problem if we had read gunicorn documentation carefully.

The default synchronous workers assume that your application is resource-bound in terms of CPU and network bandwidth. Generally this means that your application shouldn’t do anything that takes an undefined amount of time. An example of something that takes an undefined amount of time is a request to the internet. At some point the external network will fail in such a way that clients will pile up on your servers. So, in this sense, any web application which makes outgoing requests to APIs will benefit from an asynchronous worker.

Let’s change worker type to the one based on Greenlets via Gevent and run our tests again.

Results for async worker

This is definitely better, we received all responses almost at once. But just imagine what would happen if we had 1 000 000 000 concurrent requests. Keep in mind, our computation is in python, we are not waiting on I/O here. Do we really benefit from this solution here?

The answer is no. When we have CPU bound workload, thread based approaches (such as greenlets) are not going to work because there is nobody to delegate the work for. What handles the client, also handles the computation and it is fundamentally wrong.

It’s all about design

Here’s the solution that I consider the best. It is clear, simple and scales really well. Let’s consider following REST API.


Task — entity responsible for management of long, CPU-bound computation.


  • Id — String in UUID v 4
  • Status — String, possible values: CREATED, RUNNING, FAILED, DELETED, DONE
  • JobDefinition — String, possible values: IMAGE_RESIZE
  • ResultOfJob — String in UUID v4
  • Error — String


CreateTask — creates and defines task entity


  • JobDefinition — String, possible values: IMAGE_RESIZE

Success Return values:

  • 201, Id of a task created

Error Return values:

  • 400, WrongJobDefinitionValue

RunTask — runs specified task, according to task definition


  • Id — Id of Task to run, String in UUID v 4

Success Return values:

Error Return values:

  • 404, TaskDoesNotExist
  • 400, WrongTaskIdFormat

CheckTaskStatus — checks task status


  • Id — Id of Task to run, String in UUID v 4

Success Return values:

Error Return values:

  • 404, TaskDoesNotExist
  • 400, WrongTaskIdFormat

DeleteTask — deletes task


  • Id — Id of Task to run, String in UUID v 4

Success Return values:

  • 200, Status property of a Task

Error Return values:

  • 404, TaskDoesNotExist
  • 400, WrongTaskIdFormat

Workflow with the new design

The client creates a task with method CreateTask and job definition of value IMAGE_RESIZE. Task is created with default Status value CREATED. Client receives Id to the corresponding task. The client runs RunTask with id it received. The server uses external queue to queue a task and let the worker consume it. After consuming message from a queue worker changes Tasks status to RUNNING with UpdateTask method. After performing a job, the worker can update Tasks status. Tasks status can either be set to DONE or FAILED. We assume everything went well. Client can check its status by calling CheckTaskStatus method. When he sees task status with value DONE, he can fetch the task with GetTask method. There is a field JobResult within it, so the client can check result of the long computation. After verifying a job and its results, he can delete a task using DeleteTask method.

And this is basically it. We have simple, clean and close to business logic entity which is there for future audit if anything goes wrong with the computation.

Solution in Python

Here’s our domain task model with dao.

Here’s our REST API.

And there’s our worker code. I chose celery as a messaging system but you can choose any broker you want.


All in all, I firmly believe that this solution is the best to tackle the problem of long computation, especially with the dawn of microservices era. I hope you enjoyed the article, don’t forget to share if you like it!

Originally published at blog.goc.agency on May 12, 2018.

Grzegorz Olechwierowicz

Written by

I’m a software engineer and I do like my profession somehow. Opinions expressed in my articles are mine, not those of my employers.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade