Functions as a Service (FaaS) is fast gaining developer adoption. We have already talked about the maturity of FaaS and the noise around the term. In this post, we are going to discuss the performance benchmarks on the three of the top four cloud services using the results found in an academic paper. The tl:dr version of this paper is that the FaaS offered by Amazon Web Services, Microsoft and Google are still not mature enough but AWS Lambda is far ahead of the other two. Please keep in mind that this is not our research and we have simply reproduced salient points for a quick reading.
Peeking behind Serverless
Researchers from Ohio State University, University of Wisconsin at Madison and Cornell Tech evaluated the three of the top four serverless functions platforms over two years. These researchers analyze AWS Lambda, Azure Functions and Google Functions in one of the largest study where they invoked 50,000 function instances across these three services, in order to characterize their architectures, resource scheduling and performance. This post summarizes their finding for our readers. If you want more information including the numerical results from their analysis, check out this paper.
The researchers have used some of the known Linux and system admin tools to analyze the underlying infrastructure to guess how these services are architected. Keep in mind that this is an indirect inference from their analysis.
- The functions are hosted on Virtual Machines
- Each Virtual Machine hosts only one tenant
- Different versions of a function will be treated as distinct and executed in different function instances except for few outliers in the data
- While testing the services by invoking 50000 function instances, they figured out that there are 5 CPU configurations
- The functions are hosted on a container that contains the execution environments for individual functions
- According to the researchers, the underlying virtual machines are multi-tenant in 2017 but appears to be a single tenant in 2018
- While testing the services by invoking 50000 function instances, they figured out that the host VM has 1, 2 or 4 vCPUs
- There is very little information they could glean about the underlying infrastructure for Google Functions
- From their analysis, they found out that Google Functions uses 4 different CPU configurations. However, we should keep in mind that Google Functions was in beta during their testing
This talks about how the underlying resources in these FaaS are scheduled as functions are invoked. This includes instance coldstart latency, lifetime, scalability, etc.
- AWS is the best among the three services in supporting concurrent executions
- N concurrent invocations always produced N concurrently running function instances. The researchers tested up to N = 200
- AWS Lambda appears to treat instance placement as a bin-packing problem and tries to place a new function instance on an existing active VM to maximize VM memory utilization rates
- The median coldstart latency was 160 ms with 1000 instances
- The coldstart latency on a new VM vs on an existing VM was slower only by 39ms
- In AWS, the median instance lifetime across all settings was 6.2 hours, with the maximum being 8.3 hours. The host VMs in AWS usually lives longer: the longest observed VM kernel uptime was 9.2 hours. When request frequency increases instance lifetime tends to become shorter
- An instance could usually stay inactive for at most 27 minutes
- Microsoft is poor in terms of supporting concurrent executions
- Only 10 function instances running concurrently for a single function
- Microsoft seems to have fixed the cross-tenant co-residency issue which occurred in their earlier tests
- The median coldstart latency was 3,640 ms in Azure but looks like Microsoft is aware of this engineering issue and they may address this in the future
- Azure had the highest network variation over time, ranging from about 1.5 seconds up to 16 seconds but it is getting better with time
- Azure has a much better lifetime than both AWS and Google
- They could not find a consistent maximum instance idle time
- Google also had poor support for concurrent executions
- Only about half of the expected number of instances, even for a low concurrency level (e.g., 10), could be launched at the same time, while the remainder of the requests were queued
- The median coldstart latency in Google ranged from 110 ms to 493 ms
- The latency variation is much better than Azure
- Google seems to launch new instances aggressively rather than reusing existing instances. This can increase the performance penalty from coldstarts
- The idle time of instances could be more than 120 minutes
They focussed on AWS and Azure for performance isolation because they could achieve co-residency in these two services which allowed more refined measurements. They have also presented basic performance statistics for Google. Keep in mind that Google Functions was in beta during their tests.
- CPU utilization is proportional to memory allocated for the function. This provides better performance for AWS Lambda
- AWS fail to provide proper performance isolation between coresident instances, and so contention can cause considerable performance degradation
- Azure has a relatively high variance in the CPU utilization rates
- Like AWS, Azure also fails to provide proper performance isolation between coresident instances
- Like AWS Lambda, CPU utilization is proportional to memory allocated for the function. This provides better performance
- In Google, both the measured I/O and network throughput increase as function memory increases
In spite of all the buzz about serverless, FaaS providers are still in the evolving phase without the maturity needed for large-scale production deployments inside the enterprise. Both Azure Functions and Google Functions lack severely in terms of both the feature set as well as in the benchmarks described in this academic paper. We expect some standardization to happen in terms of both features as well as the performance which will make FaaS suitable for large-scale enterprise adoption. We still stand by our earlier recommendations on how we rank FaaS providers and urge caution as you explore to adopt Functions from Microsoft Azure and Google Cloud.
PS: We have reached out to Microsoft Analyst Relations for a response and waiting for them to get back. We will update this post once we hear from them. We haven’t reached out to Google’s Analyst Relations because they do not respond to our request for information needed for our research. If you are a Google employee wanting to respond to this, feel free to add your comments.
Originally posted in StackSense.io