Evaluation of Serverless Technologies at Jet

Khalid Hasanov
Jan 21, 2019 · 15 min read
Image for post
Image for post
Photo by Markus Spiske on Unsplash

Serverless functions have been around for a few years and represent a new paradigm in cloud-based software engineering. This blog post focuses on enterprise adoption of serverless functions.

We are encouraged to keep an eye on new technology trends and adopt their usage at Jet. However, the adoption of any new technology requires a rigorous evaluation process. Therefore, the first step in our serverless journey was to define our evaluation criteria for serverless function runtimes. We came up with three groups of evaluation criteria: feature requirements, performance, and benefits of serverless function runtimes.

Feature Evaluation Criteria for Serverless Function Runtimes

  1. Target Operating System — Our services run both on Linux and Windows machines. Therefore, ideally, we would like to have a serverless runtime running on both operating systems.
  2. Supported Languages — We have multiple teams using different tech stacks and programming languages. To get wide adoption of serverless within the company, it is important that our choice of serverless runtime supports all the languages used across different teams at Jet.
  3. Event Triggers — Support for various triggers: HTTP, Kafka, Azure Cosmos DB, Azure Blob Storage etc. The use of HTTP triggers does not need much justification as they are probably the most trivial and most applicable triggers to a wide range of use cases. The other important trigger for us to evaluate was Kafka triggers as Kafka is our main streaming platform for asynchronous communication between microservices.
  4. Integration with Existing Infrastructure — All new microservices at Jet are deployed on Nomad, get transparent integration with Consul for service discovery, Vault for secrets management, Prometheus and Grafana for monitoring, and Splunk for log management. This means that if we can deploy our choice of serverless runtime on Nomad, we can get all those integrations with almost zero-cost.
  5. Complexity to Manage — A system requiring complex runtime dependencies could be difficult and costly to operate. We wanted to avoid any such serverless runtime unless there was a strong reason not to do so.
  6. Onboarding and Developer Tooling — Serverless as a new paradigm already incurs a different way of thinking for developers. Ideally, we would like to provide a serverless oblivious deployment pipeline for our developers, so that, if we decided to adopt a serverless tech stack, they could use existing tooling without the need to think differently.

We identified a couple of non-managed open-source serverless function runtimes to evaluate: OpenFaaS, OpenWhisk, Knative, Kubeless, Fission, Fn Project, and Nuclio. Our selection was based on several criteria: GitHub activity of each project, documentation, flexibility to extend, ecosystem and the number of users in the industry. In terms of managed serverless function offerings, Azure Functions was the only candidate to consider as our infrastructure already built on top of Microsoft Azure.

The predefined evaluation criteria were enough to filter out most of the serverless runtimes quickly and as a result we shortlisted our list to Azure Functions v2.x and OpenFaaS only. The next two matrices highlight the main features of the previously mentioned serverless runtimes and the reasons they were filtered out.

Image for post
Image for post
Main Features of FaaS Runtimes
Image for post
Image for post
The Reasons for the FaaS Runtimes not Considered

Performance Evaluation Criteria for Serverless Function Runtimes

If we imagine a hypothetical scenario of warming-up a serverless function with 10 requests per second (req/s), the function runtime would need to allocate the required number of resources (VMs, containers, etc.) to process 10 req/s. However, if the next time the function runtime receives a significantly greater number of requests, for instance, 100 req/s, the runtime would need to allocate more resources, even though it has some warmed-up resources already. A cold-start from zero to five instances is not the same as a cold-start from zero to fifty instances.

The experiments in this article demonstrate that, cold-start time depends on how fast the runtime can spin up the required number of resources, and how fast it can scale up. The scale-up time itself depends on how fast the Docker images can be pulled onto the nodes (if it is a Docker based system), how fast new resources can be allocated, and how efficiently the decision to scale-up is made.

A cold-start time should always be associated with a number of requests per second.

Evaluating the Benefits of Serverless Function Runtimes

Cost Saving

Cost Estimation

Let’s assume that our microservices have an average resource requirement of 0.5GHz of CPU and 1GB of memory. A CPU core in our case is approximately 2.4GHz, so 0.5GHz is about 20% of a single CPU core. This would add up to 500GHz of CPU and 1TB of total memory for 1000 microservices. All of this in turn would be equivalent to about 30 Azure VMs of Standard_F16 series, which would cost minimum of $8,000 in total per month. It is assumed those VMs are deployed in EastUS2 region and use Ubuntu operating system with managed standard HDD disks on a 3-year reserved plan. If we decided to use Windows VMs for the entire fleet, the cost would go up to $25,000 per month.

Now, let’s try to do similar estimation if all of these 1000 microservices were redesigned as serverless functions. We can assume that each of the 1000 functions run two times daily for an average of 2 minutes, totalling 4 minutes in 24 hours per function. Our experiments in the next sections show that in order to get a 100% success rate from an Azure function app with 200 req/s over a 2 minutes period, Azure Functions runtime would need to allocate about 100 servers. Servers here seem to be Windows Containers according to the Azure Functions Runtime documentation. During the evaluation we observed that the average committed memory for each server was about 200MB. By taking these numbers into account we can calculate the total monthly bill for running 1000 Azure functions by following the Azure Function pricing guideline.

Azure Resource Consumption Billing Calculation
Resource Consumption (seconds) per Function per 2 Minutes:
Executions: 24,000 executions
Execution duration (seconds): 1 second
Resource consumption Total: 24,000 seconds
Resource Consumption in GBs:

200 MB * 100 Servers / 1024 MB ~ 20 GB
Total GB-s per 2 Minutes per Function:

20 GB * 24,000 seconds = 480,000 GB -s
Total GB-s for 24 Hours (4 min) per Function: 960,000 GB -sTotal GB-s per Function in 30 Days: 28,800,000 GB -s
Billable Resource Consumption
Resource consumption: 28,800,000 GB -s
Monthly free grant: - 400,000 GB -s
Total monthly consumption per application: 28,400,000 GB -s
Monthly Resource Consumption Cost per Function
Billable resource consumption: 28,400,000 GB -s
Resource consumption price: x $0.000016/GB-s
Total cost per application: $454.4
Executions Billing Calculation
Total monthly executions: 1,440,000 executions
Monthly free executions: — 1,000,000 executions
Monthly billable executions: 440,000 executions
Monthly Executions Cost:
Monthly billable executions: 440,000 executions
Price per million executions: $0.20
Monthly execution cost: $0.088
Total Monthly Consumption Bill per Function: $454.488
Total Monthly Consumption Bill for 1000 Functions: $454,488

This is about 18 times more than the cost of using microservices.

Serverless functions can easily result in higher cost instead of saving costs!

The situation is a bit different when using a non-managed serverless runtime, such as OpenFaaS, on an existing Nomad cluster (or Kubernetes). If the client nodes in a Nomad cluster are not utilised 100%, there is a high chance that the Nomad scheduler will find slots to run a function without the need for new client nodes being auto/provisioned. Cost saving in this situation would be questionable if the company uses reserved virtual machine instances rather than a pay-as-you-go billing plan; whether you fully utilise a reserved VM instance or not, the cost would be the same.

Using a dedicated Nomad cluster for OpenFaaS on a pay-as-you-go subscription plan may make sense in certain situations. However, we should take the cost of running and maintaining a separate Nomad cluster and the serverless function runtimes itself into account during our cost estimation. For OpenFaaS it includes the instances of the OpenFaaS gateway, the Nomad plugin, the NATS server, faas-idler, and nats-queue-worker. In addition, we should be able to answer to the next few questions and include their cost into our estimation. Should we use our existing Prometheus cluster and Alertmanager for OpenFaaS? If yes, we should consider the added cost of using OpenFaaS metrics in the existing Prometheus cluster. If not, we should take the cost of maintaining a new Prometheus cluster into account. How many instances of the Kafka controller should we run? Furthermore, we may have many other custom controllers, for Cosmos DB, Azure Blob Storage, and others.

Because of these rough estimations we don’t see cost saving as the main reason to adopt serverless functions.

Improving Developer Productivity

However, we do believe that serverless computing has a potential of improving developer productivity. This could be achieved by means of sophisticated controllers that abstract many of the IO challenges away from function developers. For instance, if we have a Kafka controller that can take care of message batching, retrying, and committing offsets, it would mean that the function developer does not need to think much about the interaction with Kafka, but rather implement only the business logic. The same applies to other event sources, such as, Cosmos DB, blob storages, and so on. Currently, there is no such serverless runtime providing a wide range of production ready controllers with those functionalities.

Image for post
Image for post
Kafka Controllers in FaaS Runtimes

Don’t call it serverless yet, if you don’t have production ready controllers abstracting away IO interactions between different functions, between functions and external systems, and between functions and their triggers.


A simple function application used in this evaluation takes an input string and outputs its bcrypt hash. It was implemented in C# on .NET Core for Azure Functions in consumption plan and in Go for OpenFaaS using golang-http template.

The Azure Function runtime version was 2.0.12246.0. We built OpenFaaS from its source as we had to do some small changes to be able to override its metrics endpoint so that we can easily integrate it with our existing infrastructure without any changes. Also, the default OpenFaaS alerting rule was tweaked to reflect our infrastructure. The OpenFaaS Nomad plugin was deployed on a Nomad cluster of 25 VMs running Ubuntu 16.04. The CPU and memory requirements of the Nomad OpenFaaS tasks kept the same as in the Nomad plugin repo.

Experiments using Azure Functions

Image for post
Image for post
The First Invocation of the Azure Function

httpstat was used to generate the visual breakdowns of the HTTP requests.

Invoking the same function a second time right after the first invocation took about 85% less time. The decrease was mainly due to DNS caching and hitting the already warmed-up function. We are more interested in the server processing time here, which was about 91% less compared to that of the first invocation.

Image for post
Image for post
The Second Invocation of the Azure Function

In addition to a single call invocation we performed load testing to investigate the behaviour of both serverless systems under load. The load testing was performed using Vegeta HTTP load testing library. The Vegeta target and input file used in the experiments are public and can be used to reproduce the results. The success rates were obtained from Vegeta reports and the number of instances for Azure Functions was obtained by continuously monitoring Azure Live Metrics Stream. We performed the entire load testing multiple times until the success rate reached 100%.

Ideally, any kind of experimental study should be backed by a rigorous statistical analysis. We performed 30 separate cold-start load testings with a 100 req/s request rate to calculate the confidence interval for its success rate. Due to the fact that we have to wait about 25 minutes for an Azure function to fully cool down, performing similar experiments for the other request rates would take days. A 95% confidence interval for the 100 req/s experiments was calculated by using SciPy’s Student’s t-distribution and it was [0.17, 0.51] with a mean value of 0.34.

Image for post
Image for post
Azure Functions Load Testing

A sample Vegeta report for an Azure Functions experiment looked as follows:

Image for post
Image for post
Azure Functions Vegeta Report

Another interesting observation from Azure Functions was the fact that, the average number of requests per second on each allocated server was 2. The committed memories were in the range of 159MB and 246MB and the request durations were within a range of 2000–10000 ms according to Azure Application Insights metrics.

Also, it seems Azure Application Insights does not capture most of the failures. It has a blade for failed requests metrics; however, we saw no traces of failures, but only few successes within Azure Application Insights, when the success rate of load testing was 0.34% with 100 req/s at first attempt. The Vegeta reports showed different errors during the failures, including: “TLS handshake timeout”, “timeout awaiting response headers”, “no such host”, “write: no buffer space available”, “Too Many Requests”, and “Bad Gateway”.

Experiments using OpenFaaS on Nomad

Image for post
Image for post
The First Invocation of the OpenFaaS Function

The second invocation of the same function showed a similar behaviour to what we observed with Azure Functions; the total time decreased ~82% after the first request.

Image for post
Image for post
The Second Invocation of the OpenFaaS Function

Overall, the cold start times of OpenFaaS and Azure Functions were very close to each other when using a number of requests less than 20; however, as we increased the number of requests per second the success rate of Azure Functions decreased down to zero at the first attempts. The performance of OpenFaaS was much better at the first attempts. Overall it took at most 2 attempts for OpenFaaS to fully warm up and reach a 100% success rate. On the other hand, we had to run 3 attempts for Azure Functions to reach a 100% success rate. We believe the fundamental limitations of Azure Web App sandbox are the cause of this poor performance of Azure Functions.

Image for post
Image for post
OpenFaaS Load Testing

We conducted the same kind of statistical analysis for the OpenFaaS function with a 100 req/s request rate. A 95% confidence interval for 100 req/s experiments was [93.82, 97.31] with a mean value of 95.56.

A sample Vegeta report for an OpenFaaS experiment looked as follows:

Image for post
Image for post
OpenFaaS Vegeta Report

An astute reader may think that, it is not a big deal if we reach a 100% success rate in 6 minutes (Azure Functions) instead of 3 minutes (OpenFaaS). However, if it was happening as part of real applications, the applications would need to perform a retry for only the failed number of requests, which was only a tiny fraction of the total number of requests in OpenFaaS. Instead, if it was Azure Functions, the application had to retry almost all the 24000 requests (200 req/sec).


Our next takeaway is that serverless functions are not going to solve a general computing problem. It needs to be evaluated on a case-by-case basis. Think about the integration with your existing infrastructure and integration with your legacy services.

Should I use OpenFaaS or Azure Functions?

  1. Is Nomad or Kubernetes a main part of your infrastructure?
  2. Do you have multiple teams using different languages?
  3. Are you a heavy user of Kafka and are looking for a Kafka controller?
  4. Are you ready to extend existing controllers if they don’t satisfy your needs?
  5. Should your function runtime be able to handle a high number of requests?
  6. Would you/your company like to have a full control of your function runtime?

If all of these are not a concern for you, then you may choose Azure Functions for the following reasons:

  1. Easier integration with Azure services
  2. A powerful workflow engine, i.e. Azure Durable Functions
  3. Better developer-tooling such as integration with IDEs, local development and testing
  4. Support from Microsoft

Please check OpenFaaS and Azure Functions docs to learn more and keep up-to-date as the serverless technologies evolve quite fast.

If you like the challenges of building complex & reliable systems and are interested in solving complex problems, check out our job openings.

The content and information in this blog post is the property of Jet, and cannot be copied without Jet’s express written consent. This content and information is provided for informational purposes on an “as is” basis at your sole risk. Jet makes no guarantee as to the accurateness, completeness or quality of the information, or its suitability to your specific purpose. Jet shall not be liable or responsible for any errors, omissions or inaccuracies in the information or your reliance on the information. You are solely responsible for verifying the information as being appropriate for your personal use.


2019–03–11: This blog post estimated the cost of running 1000 Azure Functions twice daily for an average of 2 minutes. That estimation assumed that the billable resource consumption cost should take the total number of allocated servers into account. However, according to the feedbacks we received, the billable resource consumption should take only the average resource usage of a function. If we did the calculation without taking the number of servers into account the billable resource consumption for a function would be $4.60 and the bill for the executions would stay the same, i.e. $0.88. Therefore the total cost of a function per month would be $5.48 and the total cost for 1000 Azure Functions would be $5,480. However, this does not change the pricing conclusion if you already use reserved VM instances. Thanks to Mikhail Shilkov for the correction.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store