Falcon vs. Flask — Which one to pick to create a scalable deep learning REST API

Another comparison of two Python web frameworks — Flask and Falcon as a machine learning rest API

At idealo.de, we sometimes have critical services where speeds matters. For example, we have millions of hotel images for which we need an “attractiveness” assessment. Going through each image every time is very time-consuming and can take up weeks if not done properly. Therefore, it is critical to optimize each step in the prediction process from the model size to creating the rest service itself. With regards to the prediction service, in the past, our team heavily relied on using Flask when creating our deep learning prediction services. Recently, we’ve experimented with Falcon though because we’ve heard and read it was much faster than Flask (also check out py-frameworks-bench). In this article, we want to share our experiences with Falcon and whether it can be used as a viable alternative to create a scalable machine learning rest API.

What is Falcon?

Falcon is like Flask, a light-weight microframework written in Python for building web APIs and app backends. Unlike Flask, the main focus of Falcon is on REST apis though as it is not suitable for serving HTML pages at all. It has a clean design that embraces HTTP and the REST architectural style.

Simple code example for Falcon:

More information on a Flask vs. Falcon comparison can be found on Stackshare. The documentation for Falcon is also quite good to start with.

Deep Learning REST API

For Flask, there has been already a plethora of articles on how it can be used to create a prediction service for machine learning models, for example, this excellent tutorial “Building a simple Keras + deep learning REST API” by Adrian Rosebrock is quite nice. For Falcon the list of tutorials on how it can be used to create a machine learning API is less than Flask. Nevertheless, the idea behind the deployment of machine learning models as an API remains the same whether it is Flask or Falcon. Generally, you train a machine learning model and then save it afterwards (e.g. in scikit-learn you can pickle your model). Then afterwards, you wrap up your machine learning model in a rest API with some data preprocessing logic. In our case, we trained a simple CNN on the MNIST dataset with Keras and Tensorflow as backend. We then stored the model into a single HDF5 file which contains the architecture of the model, weights of the model and some other information like training configuration, state of optimizers etc.

Afterwards, we used the model to create the prediction service. Here is the code:

Basically, we created an endpoint where we need to post a json file containing an image decoded in base64 to this endpoint, then the image is converted in a format that can be used for our model trained in Keras. The output is the prediction. As you can already see, the model is not explicitly initiated in the on_post function because we don’t want to reload the model into memory every time a request is made. This is very inefficient. If you want to understand it better, Adrian talks about this in more detail on his article as well (see section “How to no load a Keras model in a REST API”). Rather we initiate the model outside the Falcon resource (resources are simply all the things in the API that can be accessed by a URL) by passing it as an argument.

Then for running, the Falcon API, we use Gunicorn as a WSGI HTTP server (unlike Flask, it doesn’t have a built-in web server) and NGINX for the proxy server. The full code can be found on our GitHub repo.

Load-testing

Now it’s time to test our prediction service. We used Locust to do our load testing. It’s written in Python, open source and very simple to use. If you haven’t used it so far, it’s worth checking it out.

We deployed our applications on OpenShift with two workers using Gevent. We also scaled up to three containers, also called pods, in order to ensure some kind of stability of the service.

Here are the results with 200 simulated users and a hatch rate of one second spawned per user,

From the table (time is given in milliseconds — ms), we can see that we’ve run around 10k requests for both Flask and Falcon. We can also see that the minimum response time for both frameworks are the same with 19ms. Although Flask is a little better with the average response time and maximum response time, we can conclude that both perform quite similar to each other.

Summary

From our test we’ve concluded that Falcon is not necessarily faster than Flask. Of course, more testing could have been done e.g. more number of requests and more number of simulated users. However, at least we can summarize that Falcon and Flask perform similarly i.e. Falcon can be used as a viable alternative when we just need to create APIs. The code design is very nice and more suitable for REST APIs (you should think of resources and state transitions mapped to HTTP methods when developing with Falcon).

If you found this article useful, give me a high five 👏🏻 so others can find it too, and share it with your friends. Follow me here on Medium (Dat Tran) or on Twitter (@datitran) to stay up-to-date with my work. Thanks for reading!