Squeezing the milliseconds: How to make serverless platforms blazing fast!

In the previous article Uncovering the magic: How serverless platforms really work! we’ve seen that serverless platforms like OpenWhisk hide a huge amount of complexity to deliver the promises made to users.

Serverless platforms need to be fast. Period. After all, they are marketed as an alternative approach to host microservices, APIs and friends. We want our API calls to feel as fast as they used to on classical servers, but without the fuzz of managing them.

Ensuring that all of the components in OpenWhisk run blazingly fast is an ongoing task. Let’s have a look at some optimizations we made today, starting with the Invoker.

Recap: The critical path

When talking about performance, we usually talk about optimizing the so-called critical path. That path determines the latency a user will experience when using OpenWhisk, that is: When you invoke an action in a blocking fashion (the HTTP request stays open until you receive the result of the action).

Critical path through the OpenWhisk system.

As a recap, the chart above outlines the critical path through the OpenWhisk system as a whole.

  1. Authentication information is fetched from the database.
  2. The action is fetched from the database.
  3. A message is sent to the Invoker via Kafka.
  4. The Invoker fetches the action from the database (putting the action in the Kafka message is inefficient given OpenWhisk allows for up to 50 megabytes of action size).
  5. The Invoker runs the action.
  6. The Invoker posts the result of the invocation to Kafka.

After that, the Controller can close the HTTP request with the result it just received from the Invoker via Kafka.

The system, of course, does a bit more. It also stores the results in the database, fetches logs et cetera. In this article though, we’ll only focus on end-to-end latency and thus everything that lies on the critical path.

Bullet #5 currently looks like magic. “Invoker runs the action”. That’s it? Let’s have a look!

The Invoker

The Invoker is arguably the heart of OpenWhisk. It’s responsible for making sure your code actually runs. Its also the component which produces by far the most overhead in the system, latency-wise.

As the architecture chart indicates, the Invoker works by talking to docker. We use docker to containerize each action to be able to provide multi-tenant execution where different users do not impact each other. Containers give a convenient mechanism to “blindly” run untrusted code while having the tools at hand to prevent this code from doing bad things.

Inside those containers, OpenWhisk uses a small HTTP server to provide two endpoints, /init and /run. Those endpoints inject the action code into the container and run it respectively. /init takes the action’s code and does whatever is necessary to make this code a runnable entity. In Node.js the code is simply interpreted, but Swift for example even compiles the code. You get the idea. It’s also clear now: Initializing a container can come at a cost! After initialization was successful, /run is used to pass arguments to the action and execute it.

Container usage in the invoker.

As shown above, the critical path for a container involves the following steps:

  1. Starting the container via docker run. As we’re doing HTTP calls to that container, we also get the container’s IP address via docker inspect.
  2. Initializing the container with the action that was given by the user via /init.
  3. Run the action via /run.

If you need to go through all those steps, we speak of a cold container.

A cold container. Moving sloooowly.

Squeezing the time out

Looking at the critical paths outlined above, here’s what we’re talking about in grand total:

  • 2 docker commands: docker run and docker inspect. The former alone takes around 300 milliseconds to do its job.
  • 2 HTTP calls: /init and /run. The latency of initialization highly depends on the runtime used and the amount of code you want to run. The latency of running the code itself is determined by the task at hand.
  • 2 Kafka messages: the “job” message and the response message. They usually add less than 5 milliseconds of latency.
  • 3 database calls: authentication, get the action in the controller, get the action in the invoker. The latency here depends on where you host the database and how large the entities are of course.

Bwoah! That’s a whole lot to do and not particularly fast. docker run in particular looks like something we want to work around to make the system respond as fast as possible. Let’s see how we can optimize the overhead away, step by step.

Good old caching

To reduce the overhead of database calls we use in-memory caching. That’s all. At a steady state and if you bring a lot of load to the system there won’t be any call to a database on your critical path. By using caches we completely take the database out of the game.

Container reuse

A burning hot warm container.

One of the most obvious mechanisms to reduce the overhead is to completely take the containerization system out of the game. In this case, that means again: caching. Or container-reuse. If a user fires the same action twice, and the first action has already finished, we can just use the same container again. Per the steps mentioned above, that will spare us the docker run and the HTTP call to initialize the action. In OpenWhisk we call this a warm container. Warm containers are the best you can get in terms of latency and throughput. The more load you impose on the system, the more warm containers you will have.

Doing the math, for a warm invocation we completely avoid the docker commands needed to start a container as it already exists. The /init call vanishes as well. We’re left with 1 HTTP call (/run) and our 2 Kafka messages on the critical path. That’s as close to your application latency (determined by /run) as it gets.

Container prewarming

Prewarm container. Not hot. But warm at least.

Warm containers do not solve the “cold-start” latency problem though. That is the effect of the very first invocation of an action taking awfully long. To address this, OpenWhisk spawns so-called prewarmed containers. For example, let’s assume that the majority of all requests use Node.js based actions. As a consequence, OpenWhisk can spawns some Node.js containers, anticipating user load.

That has the effect of reducing cold-start latency by quite a bit as it eliminates the most expensive operation we have (docker run) and takes it off the critical path. That leaves us with 2 HTTP calls (/init and /run) plus the 2 Kafka messages on the critical path for an invocation using a prewarmed container. Not too bad, is it?

Summary

Using extensive caching either in database connections or in the container lifecycle dramatically reduces OpenWhisk’s overhead and thereby reduces end-user latency.

Reducing overhead by increasingly hot containers.

Serverless systems are all about reducing the overhead of the multi-tenancy they impose on the vendor as much as possible. After all, vendors who host a serverless platform need to make sure their machines are as close as 100% utilized by user-code as possible. Aggressive caching strategies and container pooling help OpenWhisk go a long way in this game, but the optimizations shown are only the tip of the iceberg. There are lots of smaller improvements in the system. For example OpenWhisk uses runc instead of docker to control the container lifecycle or directly talks to the filesystem to extract information about containers. That’s enough material for at least another article though. Stay tuned, it’ll pay off!


If you liked this article, give it one of the nice green hearts 💚, recommend it to your friends and follow OpenWhisk. We have a lot more stories to tell and always great stuff coming up. Stay tuned 🚀

Markus Thömmes is a leading contributor to the Apache OpenWhisk project. He loves serverless and ☁️ in general. Follow him here on Medium and on Twitter for more deep-dive information about serverless platforms.