Performance Testing Serverless GCP Cloud Functions
I recently started to work on GCP CloudFunctions (GCF) and wanted to see the performance of cloud functions compared to other options.
During my brief experience, I started to see some terrible performances from GCP, and wanted to test it against a traditional Micro-Service deployment. and see if the fault was in my code, or with GCF
The code used for testing is something very generic and very basic. Take a HTTP request data, do some very-very light data manipulation, and store that object in a database.
The code base is very light and simple:
We have the same code base for GCF, but just transformed for GCF: Instead of `app.post()` we have an `export.myfunction = (req,res)`
The Kubernetes Express Server deployment:
Based on the deployment.yaml we are giving our deployment a memory limit of 200MiB and a CPU of ~ 0.5 GCP cpu cores.
Running `Hey` on this k8s deployment yields the following data:
Even though our code is not doing much logically, nor is it large code base. For pure throughput, I think the code base is a good litmus test for CRUD like service performance.
50 concurrency (~180 requests/sec) would be what I consider a bit above average for most applications, even some “enterprise” applications. This translates to about 8 Million requests per day (if you “guestimate” a peak of about 12 hours) and about 240 Million / month. At 24hr continuous its about 480 Million / month
To put this into perspective: Take acloud.guru, which famously runs its backend via AWS Lambda functions: Only ~1.5Million / month . E-trade is at about ~14 Million / month. Yelp: ~140Million/ month. Obviously my “dumb” code is not doing nearly as much as an API service of Yelp…but you can hopefully extrapolate the data to see how it would scale at much more enterprise like code base.
Cloud Function Deployment
Just like the kubernetes application, We run Hey @ 50 concurrency to get the following results:
But wait! thats not all, of those 1000 requests, 8 of them failed due to timeout. (we have set a 60 sec timeout on GCF)
The GCF function took 5x longer to complete the same amount of records, at almost 4x the average speed, not to mention it is almost 80% slower in requests per second.
WHAT IS GOING ON?!?
OK, maybe some how there is some voodoo going on with my imports not playing nice in a Serverless world (highly doubt it, but I’ll give the benefit of doubt). So what about the most simplest of REST functionality? a simple health check function that does absolutely nothing but send “200 OK” back.
The difference of GCF vs K8s application is a whopping ~500% increase.
OK, maybe the issue is with GCF cold starts…but that would still not answer why it is this slow, considering after the initial burst of container start ups, those containers should be available to re-process.
For testing sake, I created the same function as an AWS lambda function, behind API Gateway:
To visualize the difference on each type of deployment (higher is worse):
You see in Lambda, we have a good equilibrium of performance vs cost. The slowness can be answered by the initial cold starts, and then quickly scales out when those containers are re-used. Even though the average is a bit slower, its not by much, and will easily be offset by the cost savings.
To be very blunt, When it comes to enterprise level applications, Google Cloud Functions are in the little leagues where as AWS lambda, and even ServerFULL applications are in the professional league. In my opinion, the cost benefit of running GCF vs the drastic performance hit unfortunately does not warrant any switch. If you are thinking of going serverless, go with AWS lambda or K8s serverless such as Fission / Kubeless
Yes all deployments are provisioned correctly. In fact GCF is much higher (2gb) vs AWS (128mb)