Interactive serverless — no free lunch

Cloud services constantly change and innovate and we, Kainos, do the same. Having delivered IaaS and PaaS solutions, it was high to time to try out FaaS. I’d like to share some of the things we learned by building a simple subscription service using AWS Lambda.

We built a simple subscription service, using AWS Lambda and API Gateway. We needed our service to work without Javascript, so we could not use a single page app. Instead, we decided to use a Lambda fronted by API Gateway to generate page content, and to use DynamoDB as the backing store for our subscriptions.

This gave us a lot of insight into Lambda performance in interactive use, and that is the focus of this article. In particular, I will talk about what happens when the Lambda run-time spins up a new container to serve requests to our function, which is called a “cold start”.

What “good” performance means

It is always good to stick to data. RAIL model is a result of research we can consider as a baseline. 1000ms to deliver content is our baseline.

Language matters

This is what we knew, we were just not sure to what extent. JVM due to the size of base distribution has the biggest cold start overhead (will that change with modular Java 9?). Compared to Node.JS or Python, or Go (through Node.js shims), JVM is gluttonous, however, out of all supported platforms, it seems to be fastest once warmed-up. Barring container startup, Java looked fine.

API Gateway integration overhead

Currently, the only way to expose Lambda to the Internet is through API Gateway. When function executed below 10ms according to Cloudwatch Logs, the TTFB was 150–200ms most of times. API Gateway was deployed in eu-west-1 region and accessed from UK. Either you accept it or you cannot do much about it.

HTML rendering

To render HTML serverside (SPA fans, close your eyes), one usually uses template engine, a quick one. Handlebars Java fit just fine. It was a good trade-off between speed and functionality (master templates). Templates are text that needs compiling: creating AST representation. Here you can find results for rendering process of master and homepage-specific partial: ~2s for 512MB container and ~0.9–1s for 1GB one. In terms of cost and user experience both looked like a lot to have barely acceptable result! Once compiled, it took ~30ms on subsequent requests (templates already compiled on first request).

If only we could precompile templates before deploying the function the performance would be just fine. However, it was not possible with the chosen template engine or any well-known template engine. Shall we wait for serverless-ready templates? :-)

Post-Redirect-Get

For forms submission, the good practice is to follow PRG pattern which basically almost doubles the time to deliver content to user.

Secret management overhead

Imagine you have a secret API key that should be kept secret. The only efficient way to pass configuration to Lambda functions are environment variables. Unfortunately, they can be inspected by everybody who have access to your AWS environment. A good practice is to pass already encrypted keys as a variable so that function can decrypt it on the fly as necessary. KMS is the only sensible option at the moment. Since decrypting involves call to KMS per secret (there is no batch decryption), visitors of your application will likely incur that overhead every so often. Cache your keys in a function container to reduce calls to KMS and impact on user experience.

Integration clients overhead

Meaningful serverless function talks to something. In a serverful world, clients of integration can be initialized eagerly so that they are warm when it comes to service user requests. Here are examples of initialization overhead (1GB Lambda):

  • DynamoDB client initialization (without executing any DB operation): ~2s
  • AWS KMS client/Apache HTTP library client: ~1,5s

It was especially interesting to understand where the overhead of HTTP client was coming from. When creating the client twice, it always only happened on a first call. We made sure no connection was reused. Every subsequent call took 300–400ms. Turning on DEBUG mode showed: Opening socket -> TLS negotiation -> Payload exchange, in both cases. TLS negotiation took its toll of 250–300ms in both attempts. What was different was the time difference between subsequent log lines of both invocations. On the first creation the differences were larger. We think it is due class loading which in Java is lazy. We could not change implementation of KMS or DynamoDB clients to verify our hypothesis, but we could modify the 3rd-party API client as we rolled out our own. Moving to OkHttp client, sped up first call by 200–300ms. Using bare-bone HttpUrlConnection took off another 500–600ms. What is difference between those clients is the complexity of implementation. It is not to say, you should always go for HttpUrlConnection - you may need robustness that Apache HTTP client provides, but if you do not, pick the lightest or…

Mitigation #1: language

So many performance challenges with JVM! Do those pertain to others, like Node.JS? From our experiments, Node.JS can help you in keeping response times at relatively low level even on cold starts. Node.JS has very little overhead for integrations — where JVM showed 2s, its Javascript counterpart ran at 0,5s (with Unirest.io). Its small core and philosophy of tiny packages is a winner in all circumstances for FaaS (using criteria of cost and performance) and is something we recommend for interactive serverless at the moment. At least, it does not immediately force you to take a deep dive of underlying function container lifecycle.

Mitigation #2: keep your container warm

Many of the issues we have gone through can be mitigated with warm containers, which essentially is making your serverless solution a bit more serverful. The caveat is, that there is no SLA around warm containers. Our goal is to hide the effect of cold start/first initialization from our users. This can be achieved to some extent by implementing container keep-alive mechanism, which we plan to publish a follow-up article about soon, so stay tuned.

Finally, is FaaS for interactive use cases?

TL;DR; it depends on your use case. At the moment, to receive steady performance, according to the definition we set earlier, it seems you need to fight the technology a lot. Over time, technology from both FaaS providers and library writers will adapt to frequent cold starts. For a very simple use cases where performance is not critical, serverless is very tempting and may be considered applicable. For anything else, we recommend sticking to serverful.