The interesting case of Flask, Gevent, Context-Switching and a bunch of other buzzwords

Published in

skai engineering blog

7 min readMar 1, 2019

Troubleshooting a multi-threaded Python microservice is always a tricky business, but something that any self-respecting Python developer can and should be able to handle.

Troubleshooting a multi-threaded Python microservice with a third-party library monkey-patching your code? Well, that’s a whole different ball game.

Our story begins

I was approached by an engineer from another team. She said that her team is currently developing a new Python microservice, and that they’re experiencing some odd behavior from it. The service becomes completely unresponsive after a very small (and random) number of requests.

Before we dive in to the nitty gritty, this is a good place to mention that here in Kenshoo our Python microservices are usually a flask web server run by Gunicorn.

Chapter Ⅰ — Know where you stand

The app only had one route which did one very simple thing: it submitted a task, called split, to a thread pool using arguments from the request body and then immediately returned a 200 OK response.
The thread pool that was used was the one from the concurrent.futures library, and it was configured with 5 workers.
The task that was submitted to this pool (the split task) was also quite simple. It first downloads some data from S3 (i.e. doing some I/O), and then does something between 10–60 minutes of intensive and uninterrupted CPU work.

At this point, given my experience with Python threads, I could already tell that there’s no point in setting the number of workers in the pool to any number greater than 2. Why? Because of the GIL.

Chapter II — The GIL (AMAZING doc regarding the GIL — here)

Python has a GIL — Global Interpreter Lock. Basically it’s there to prevent any real concurrency by making sure that at any given moment, there is only one thread running inside a python process.

The GIL does a context-switch between threads in 2 scenarios:

When a thread is doing an I/O operation
Every 100 “ticks” (a “tick” very loosely represents an interpreter instruction)

This means that even if we have a thread that is very CPU-intensive, it won’t completely block the work of other threads, but rather once in a while (“a while” is probably milliseconds) the GIL will also run other threads.
All of the above is of course a very high level description, but you get the concept.

If we look at our use case, we can clearly see that there’s no point in setting more than 2 workers in the pool, since multiple workers will just block each other and we’ll get no benefit. Why 2 and not 1? Because we do have a little I/O at the beginning, so 1 thread can use the CPU while the other is doing I/O.

Chapter III — We need to go deeper

We saw that the thread pool configuration might be a bit inefficient, but even in the worst case, the app should definitely remain responsive since the GIL should allow our flask worker some CPU time to handle requests. And since our logic is really short & simple (submit a task and return a response) that should work just fine.

To make things clearer, let’s see how we expected the app to behave given 5 consecutive requests:

Before any requests arrive, we have one process, our Gunicorn worker, which contains only one thread running our flask application (main.py), and is ready to receive requests.

After handling the first request, we have one thread that is running our app (main.py) and a thread running the split task of the first submitted request (that was executed by the ThreadPoolExecutor), and at this point we also already have 4 requests waiting in the Gunicorn queue.

So now what?

The thread running the split task is not doing any I/O, but on every 100 “ticks” the GIL switches to another thread, allowing the main.py thread to run and handle the next request (which is already waiting in the queue), so our worker should look something like this:

And so on… since the request itself is very short, the short time that the main.py thread gets from the GIL is enough to handle the request and return a response. Obviously, there won’t be any real benefit from the threads when looking at the split tasks (due to the GIL), but our app should still be responsive.

Chapter IV — The missing link

So, what are we missing? Why is our app still unresponsive even though it clearly should be?

The flow described above is correct when working with the NATIVE worker. But we were working with the GEVENT worker. I had an inkling that the issue had to be something caused by the Gevent worker, that’s the missing link! But how did I know? Well, first of all you need to understand what the Gevent library is.

The Gevent library is using coroutines to implement a type of micro-thread.
Coroutines, in my opinion, is one of the most advanced topics in Python, and you’re more than welcome to go over this GREAT document if you want to learn more about it.

Basically, we can look at a coroutine as a “micro-thread” inside our thread. We pass control to the coroutine, which in turn runs its code until it is ordered (using yield) to give control back to the thread. But it’s critical to understand that since a coroutine isn’t actually a different thread, the GIL DOESN’T do context-switching between coroutines, or between a coroutine and the thread itself. A “context-switch” (again, it’s not a real context-switch since it’s all happening in the same thread) happens only when explicitly called for (using “yield”).

Chapter V — The chapter with the monkey

A Gevent Gunicorn worker is an a-sync worker. Each request is handled by a different coroutine, so multiple requests can be handled by the same worker.
When using Gunicorn with a Gevent type of worker, Gevent is “monkey patching” our code to be more a-sync suitable. Among other things, it patches the Python threading library and replaces it with gevent.threading. This effectively turns threads into coroutines.
So, when our ThreadPoolExecutor starts a new thread running a split task, it is actually starting A COROUTINE. That’s it! We found our root cause!

So, with our combined knowledge of the GIL, Coroutines, and Gevent, we can now describe the ACTUAL flow in our app and you’ll see that soon enough we’ll find our problem.

The first request — same as before:

And now things get interesting. When a request arrives, the worker starts a new coroutine to handle the request, and then when we submit a new split task to the ThreadPoolExecutor, another coroutine is created to handle the task.
So after the first request, our worker looks like:

Now, while our split task is downloading data from S3, i.e. doing I/O, Flask has some CPU time and can submit a random number of split tasks. But, once the first active task’s I/O phase is done, since it wasn’t doing anything else that was patched by Gevent to cause a context switch, it basically “locks” our thread and never gives control back to main.py, so no new requests can be handled.

By the way, the monkey patch also patches some blocking calls, like I/O calls or sleep(), so that they force a “context-switch”. This way, it keeps the original behavior of these type of calls when using real threads.

So basically, one way we could have solved our issue is just to add sleep(0) at every interval (I mentioned earlier that Gevent patches sleep, so calling it causes a context-switch), and allow Flask to handle some requests. We didn’t do that of course for the simplest reason — it’s very bad practice.

Final Chapter — Everything that has a beginning has an end

When you need to choose a concurrency solution for your Python microservice, you should remember that there isn’t one solution that will fit all use-cases. You need to understand the type of workload you expect from your app, and choose the most suitable solution for this use case. For example, if your work type is made out of a lot of relatively short I/O operations, an async solution (such as Gevent) can be extremely efficient.

Another thing is, adding a third-party library that patches your code is something that should always be handled with care, and you should always keep it in mind when you add new features that might “suffer” from this patching.
In our use case, we changed Gunicorn to work with native workers and instantly everything worked like a charm.