Interactive serverless — no free lunch
Cloud services constantly change and innovate and we, Kainos, do the same. Having delivered IaaS and PaaS solutions, it was high to time to try out FaaS. I’d like to share some of the things we learned by building a simple subscription service using AWS Lambda.
This gave us a lot of insight into Lambda performance in interactive use, and that is the focus of this article. In particular, I will talk about what happens when the Lambda run-time spins up a new container to serve requests to our function, which is called a “cold start”.
What “good” performance means
It is always good to stick to data. RAIL model is a result of research we can consider as a baseline. 1000ms to deliver content is our baseline.
This is what we knew, we were just not sure to what extent. JVM due to the size of base distribution has the biggest cold start overhead (will that change with modular Java 9?). Compared to Node.JS or Python, or Go (through Node.js shims), JVM is gluttonous, however, out of all supported platforms, it seems to be fastest once warmed-up. Barring container startup, Java looked fine.
API Gateway integration overhead
Currently, the only way to expose Lambda to the Internet is through API Gateway. When function executed below 10ms according to Cloudwatch Logs, the TTFB was 150–200ms most of times. API Gateway was deployed in eu-west-1 region and accessed from UK. Either you accept it or you cannot do much about it.
To render HTML serverside (SPA fans, close your eyes), one usually uses template engine, a quick one. Handlebars Java fit just fine. It was a good trade-off between speed and functionality (master templates). Templates are text that needs compiling: creating AST representation. Here you can find results for rendering process of master and homepage-specific partial: ~2s for 512MB container and ~0.9–1s for 1GB one. In terms of cost and user experience both looked like a lot to have barely acceptable result! Once compiled, it took ~30ms on subsequent requests (templates already compiled on first request).
If only we could precompile templates before deploying the function the performance would be just fine. However, it was not possible with the chosen template engine or any well-known template engine. Shall we wait for serverless-ready templates? :-)
For forms submission, the good practice is to follow PRG pattern which basically almost doubles the time to deliver content to user.
Secret management overhead
Imagine you have a secret API key that should be kept secret. The only efficient way to pass configuration to Lambda functions are environment variables. Unfortunately, they can be inspected by everybody who have access to your AWS environment. A good practice is to pass already encrypted keys as a variable so that function can decrypt it on the fly as necessary. KMS is the only sensible option at the moment. Since decrypting involves call to KMS per secret (there is no batch decryption), visitors of your application will likely incur that overhead every so often. Cache your keys in a function container to reduce calls to KMS and impact on user experience.
Integration clients overhead
Meaningful serverless function talks to something. In a serverful world, clients of integration can be initialized eagerly so that they are warm when it comes to service user requests. Here are examples of initialization overhead (1GB Lambda):
- DynamoDB client initialization (without executing any DB operation): ~2s
- AWS KMS client/Apache HTTP library client: ~1,5s
It was especially interesting to understand where the overhead of HTTP client was coming from. When creating the client twice, it always only happened on a first call. We made sure no connection was reused. Every subsequent call took 300–400ms. Turning on DEBUG mode showed: Opening socket -> TLS negotiation -> Payload exchange, in both cases. TLS negotiation took its toll of 250–300ms in both attempts. What was different was the time difference between subsequent log lines of both invocations. On the first creation the differences were larger. We think it is due class loading which in Java is lazy. We could not change implementation of KMS or DynamoDB clients to verify our hypothesis, but we could modify the 3rd-party API client as we rolled out our own. Moving to OkHttp client, sped up first call by 200–300ms. Using bare-bone HttpUrlConnection took off another 500–600ms. What is difference between those clients is the complexity of implementation. It is not to say, you should always go for HttpUrlConnection - you may need robustness that Apache HTTP client provides, but if you do not, pick the lightest or…
Mitigation #1: language
Mitigation #2: keep your container warm
Many of the issues we have gone through can be mitigated with warm containers, which essentially is making your serverless solution a bit more serverful. The caveat is, that there is no SLA around warm containers. Our goal is to hide the effect of cold start/first initialization from our users. This can be achieved to some extent by implementing container keep-alive mechanism, which we plan to publish a follow-up article about soon, so stay tuned.
Finally, is FaaS for interactive use cases?
TL;DR; it depends on your use case. At the moment, to receive steady performance, according to the definition we set earlier, it seems you need to fight the technology a lot. Over time, technology from both FaaS providers and library writers will adapt to frequent cold starts. For a very simple use cases where performance is not critical, serverless is very tempting and may be considered applicable. For anything else, we recommend sticking to serverful.