Scaling Expensive Processes with Lambda

Published in

Sainsbury’s Tech Engineering

8 min readMar 13, 2018

This week we stumbled into an interesting little problem with one of our node applications. This microservice was responsible for converting a JSON object into an HTML email with a plaintext counterpart. For those of you who have “enjoyed” the “fun” of generating HTML that will work with all the major mail clients, you may have stumbled into this similar issue.

Many mail clients can’t be expected to work reliably with CSS; one in particular (ahem… Outlook) is a bit of a loose cannon. As such, you need to remove doubt. How do we do that? We inline all the CSS we can. For this, we used the fantastic juice library.

Unfortunately, when we began pushing some load through this new creation, we found response times of 200–300ms. This, on its own wasn’t too bad. Mixed with hundreds of concurrent requests, it spelled trouble. We identified a few key things that we needed to improve about our application:

Speeding up the time it takes to generate an email.
A better utilisation of the CPU.
Better handling of concurrent requests.

How we measured success

We had a variety of tools at our fingertips to provide insight into the behaviour of our system. We used our Elastic stack to visualise application logs, giving us insight into what was being processed slowly or not at all.

We were utilising Cloudwatch metrics to give us the CPU, Memory usage and Network IO of both our applications and database servers. These provided critical insights into where the bottlenecks were.

Later on in our investigation, we moved over to Grafana, allowing us to visualise all of our Cloudwatch metrics and application logs in one place, to more easily compare, for example, when slow downs in performance coincided with CPU spikes.

These tools were instrumental in our identification of our microservice as a problem application in the first place, but continued to serve us well during our investigation. Their key contribution was the dramatic reduction of time taken to validate a hypothesis — a phrase I will be using consistently throughout this post.

Speeding up the application

To begin, we broke our application flow down into three key metrics.

Templating — the act of converting from JSON to HTML

Minifying — stripping out whitespace from the code to lower our network footprint.

Inlining — converting class references into inline `style` attributes.

We had many speculative theories in place to answer the question of why this app was just so slow. We were using pug to convert a JSON object into HTML, and it was the prime suspect in our investigation. This was doing the most complex thing, and therefore was our most likely culprit.

Our data did not match our hypothesis:

We measured each step in our flow to see how long they took, using the juice library for CSS inlining

Okay, message received — inlining is slow.

CSS Inliners start from the CSS. That means they go through each rule and query the DOM to see if any elements exist that match. If you have lots of CSS rules, this will slow you down. If you have lots of unused CSS rules, it will needlessly slow you down. Worse still, if you’re using a slow performing library to query the DOM, this will have a cumulative effect on your throughput. From this, we went down several avenues of investigation. For brevity, I will only include those avenues that yielded some performance gains.

Blame the tool

Our first port of call was to experiment with some different libraries. We swapped out juice with the inline-css module. Immediately, we saw some performance improvements.

We again took measurements, this time using the inline-css library for our inlining step.

Inline-css is based on the cheerio library, which is typically quicker than jsdom — the engine used by the juice project. We had removed the cumulative effect of lots of successive, slow queries, resulting in around a 20% performance gain. This was our first win, but we noticed something else that was odd about the application. From our cloudwatch metrics, we could see the CPU utilisation of the EC2 instance was spiking around just 40%.

We need more power!

Our node application wasn’t clustered. Node is single threaded, which meant that it was bound by the use of a single core on the servers that it was running on. Our goto was the venerable cluster library that offered a brilliant API. We used the os module to determine the number of cores on the box and span up a number of worker processes, load balanced by a master process.

With this in place, we expected to see even better throughput across lots of requests. Given it was a dual core box, our working theory was that by bringing in a whole new core, we’d get twice the performance gains. We ramped up the number of queries we were throwing at it, to give a better image of how the cores are working.

We jumped up to a concurrency of 100 users to test out the clustered node app and watched our node application grind to a halt.

In the timeless words of Scooby Doo, “Ruh roh”.

Our subsequent research had indicated that the cluster library is the slowest way to make use of the whole CPU, but this test had delivered something much more valuable. Perspective. All of the steps in our process took at most 500ms, yet our response times were averaging at 2.5 seconds. Why?

Concurrency was the problem

As concurrency increased, our other issues fell by the wayside. The inability of our server to handle lots of concurrent requests, combined with some of these expensive operations, were the real problem. We would happily handle 250ms response times, but 2.5 seconds was out of the question. Once again, we took some data, increasing the number of concurrent requests with each test:

Using Concurrency as our independent variable, we measured the impact that more concurrency had on our HTTP response time.

Suddenly, a 250ms response time isn’t looking so bad, is it? Even with clustering, even with the faster CSS inlining, concurrency was the issue. There are plenty of ways of dealing with concurrency, but we began at the basic one — horizontal scaling on demand.

Scaling the number of boxes based on CPU utilization got us some performance gains. Average request time successively went down with each new fleet of servers that came up to support the load, which in turn lowered our average request rate:

We compared our previous response times with the performance improvements we got from horizontal scaling of Amazon EC2 instances inside an auto scaling group.

We were horizontally scaling up to a large number of servers, yet our response times weren’t going down fast enough. Upon viewing our data, we saw that it wasn’t constant. It came in short lived peaks. Those peaks were the real issue here — they would hold up the system, creating back pressure on requests waiting in line. It took too long to increase the number of servers available.

So far, our application was using all of the CPU, the code itself had some tuning in place and we were running on a scalable set of EC2 instances. Despite all of this, even with indefinite horizontal scaling, we were not getting the response times we wanted. Our application needed cat-like reflexes if it was going to gracefully deal with our unpredictable load.

Enter the Lambda

Serverless architectures are all the rage right now — there are many horror stories attached to them, just as many as there are stories of triumph. The suggestion came after EC2s in an auto scaling group just weren’t working for us. We had some initial concerns, but regardless, it was worth an experiment.

It didn’t take a huge amount of engineering effort to alter our Express server based application into a lambda handler. We had modularised the flow of converting JSON into HTML, so it was simply a case of exposing new lambda handler and invoking the same flow with the same parameters. A win for internal decoupling!

Additionally, our disciplined use of Terraform for infrastructure-as-code meant adding in the necessary IAM roles to allow lambda invocation was trivial. All in all, it took a few hours to test our Lambda theory. So, how did the lambda improve our average HTTP response times? Drum roll please…

We compared the performance of our application running on a lambda and took average response times with increasing numbers of concurrent users. It’s worth noting that these requests were issued to a lambda that had already started up, so no initial warm up time was factored into this.

Lambdas don’t need to “scale” like a more traditional EC2 set up does — it runs out of the box with near limitless capacity. The default limit on the number of concurrent invocations is 1000, which was somewhat higher than our maximum expected load. The only thing that slightly pushed up our averages was the initial “warm up time” that a lambda has. A few seconds on the first request. Once that was out of the way, our average request time never jumped any higher than 250ms. This meant our system was comfortably able to scale to load, with plenty of headroom. This was great, but another alarm had gone off in our heads — speed usually comes at a price.

Were we about to break the bank?

We were very happy with our new found performance, but were worried about the cost. How much is this thing going to set us back? Is our Product Owner going to have a panic attack? All good questions. To answer, Amazon offer a rather handy EC2 calculator to get an estimate of your monthly costs.

We took an average of the number of instances we would expect to have at one time, sometimes it would be 8, sometimes 1. We settled on 4. The monthly cost for these servers was $248.88. And how much was the lambda?

We calculated the number of invocations that this thing was going to need in five years. That is to say, we took the estimations of the kind of traffic our site would be processing in 5 years’ time and used those numbers. We plugged that into another rather handy lambda calculator and came out with $36. We assumed we’d entered in the numbers incorrectly and had another go — still $36. It would cost us almost 7 times less, with a response time that was almost 10 times faster. We gained speed and cost saving.

Lessons Learnt

It is dangerous to consider Lambdas to be the answer to all scaling problems. There are plenty of gotchas. That initial warm up that we called out earlier has been a crippling problem for some companies who need quick responses all the time. Likewise, if you’re running your Lambda inside of a VPC, you need to make sure you’ve got enough IP addresses available for that Lambda to scale.

That said, when you’re running discrete, independent operations (such as converting JSON into HTML) over unpredictable loads and need consistently low response times, regardless of concurrency, the Lambda is an incredibly powerful tool in the arsenal of any engineer.