At Nielsen Marketing Cloud, we use Serverless computing and cloud storage as a core component of our data processing platform. In this blog series, we will share what we believe to be high-value insights on how the computing world is changing and how you can leverage that to build incredible applications.
In the previous post of Going Serverless, we covered the computation world’s journey from owning servers to leasing servers and finally to rent computation power for very short periods, aka Serverless.
We saw the downsides of the first two paradigms and the ground-breaking benefits that Serverless unleashes.
In this post, we will explore a data processing platform we’ve built in Nielsen, based on Serverless infrastructure.
We will see how Serverless allows us to build highly scalable applications and even reduce costs.
We will also understand the pitfalls of Serverless and how to avoid them.
But before we begin…
What is Serverless?
The idea is that the server becomes a service. You just need to write a “function” (i.e a piece of code) and instead of managing servers yourself, a cloud provider dynamically allocates resources for your code.
How is that related to scalability? When your application needs more resources (for example when the incoming traffic is high), you can execute more invocations of that function and the cloud provider will allocate the appropriate resources per invocation.
This allows you to build a “breathing” application that grows and shrinks in real-time and can handle any kind of load.
Applications that aren’t built on Serverless infrastructure aren’t “breathing”. Let me explain with two examples:
A. The application runs on a server (or a cluster of servers) that has enough resources to handle the maximum load of the application.
In this case, most of the day, when the load is average, the server will be underutilized even though we are paying the full price for it.
This is very similar to a car that stands in the parking lot most of the day.
B. The application runs on a server (or a cluster of servers) that only has enough resources to handle the average load of the application.
In this case, the application will suffer from performance issues, and a backlog of tasks will start piling up. Exactly like traffic jams on narrow roads at rush hours.
Moving on to a real use-case
The application we needed to build was a data pipeline that streams data to our partners, which are ad platforms (as of today, we stream 250 Billion events per day and growing).
Also, we had to design a scalable application, because the amount of the data can increase and decrease throughout the day.
When the team initially brainstormed how to design this application, Apache Spark automatically came to everyone’s mind — Spark was built to handle huge amounts of data.
But the problem is scalability: while Spark is obviously scalable, its scalability is somewhat rough around the edges — your smallest “scalability unit” is an executor.
However, you don’t necessarily want to launch a new executor (or sometimes — a whole new worker node), just to accommodate a few short tasks.
And even if we do that, scaling-out (or scaling-in) takes time (which can range from seconds to minutes, if we launch a new worker node).
In our case, waiting several minutes for a new worker node to be added to the cluster is just too much.
A second problem is that the input data is stored in thousands or even millions of files, which vary in size.
It will be difficult to tune the cluster to process them optimally, thus this application will end up being very expensive.
We had experience with that in the past — we had a Spark application that was splitting files to smaller chunks.
This application was working with a variety of file sizes — most of the files were very small (around 50MB), but now and then we had a really big file (about 5GB).
When we reimplemented this application using Serverless infrastructure, the cost dropped by 44X (from $1850 to $42 per month).
But the improvements weren’t just related to costs — the performance got much better since when a 5GB file arrived, the Serverless infrastructure allocated enough resources to process it quickly.
Spark — Out, Serverless — In
As Spark was out of the question, we decided to implement our application over Serverless, and specifically, AWS Lambda (hereinafter “Lambda”), which is a Serverless computing platform provided by AWS.
Again, our application processes files containing data we generated earlier in the pipeline (see the ‘Incoming files’ box in the diagram below).
Every file contains multiple records we call events.
Each file is processed via a single Lambda invocation (see the ‘Uploader’ box), that sends its data to our partners (see the ‘Ad Networks’ box).
As the number of files we receive every moment is not fixed, so is the amount of Lambda invocations that are being executed, and here’s how the scalability of the application comes into play.
This allows our application to grow and shrink throughout the day by a factor of 3, uploading a total size of anywhere between 0.3TB/hour and 1TB/hour to our partners.
This is not a hard-limit, it’s just what we actually need. Theoretically, it’s limitless.
And the best thing is — we didn’t have to write a single line of code to support it — the fact that our application is Serverless-based makes this amazing capability possible.
On top of that, Serverless infrastructure reduces our time to market — when scalability comes out-of-the-box, the development team can focus on business needs instead of infrastructure considerations.
Show me the money!
The fact that the application’s cost is linear to the computation power it uses, has both negative and positive sides.
On the negative side, you have to be very careful with costs — if your resource consumption increases (even due to a bug), the cost can skyrocket.
Moreover, until you discover the bug and deploy a fix, the meter is running.
To overcome this problem we are using alerts to detect such issues as early as possible. We are collecting metrics from the Lambda invocations to a central monitoring application, and send alerts to Slack when we have more invocations than expected, when the average duration dramatically increases, or when the failure rate is too high.
On the positive side, Serverless opens the door to cost reduction — if you will optimize your resource consumption, you will save money — this is a huge incentive. This is not the case with other architectures — if you will optimize a code that is executed on a server, you will improve the performance but not the necessarily cost (at least not as linear as with Serverless).
So Serverless puts a mirror in front of you, that pushes you to optimize your code.
From $7 to $3 (per billion events) — 60% COST REDUCTION!
For every Billion events we send to our partners, we are paying around $3. However, in the first months after deploying our application to production, we were paying around $7.
How did we achieve that?
First, let’s understand how the pricing model of Lambda works — the price is determined by two factors:
- The amount of memory that was allocated to the function (this part is static and configured once, when the function is deployed);
- The duration of the invocation in 100 milliseconds’ resolution.
For each memory setting (for example 128MB or 512MB) there is a price for 100 milliseconds (see here).
This formula has two pitfalls:
- Let’s say I’ve changed the function’s configuration and reduced the allocated memory from 512MB to 128MB.
The new duration may be longer since my code has less memory to use.
So in total, I won’t necessarily pay less.
- Unofficially, the “strength” of the CPUs allocated to your function invocation, depends on the memory you’ve allocated. Let’s say we’ve increased the memory setting from 512MB to 1024MB — we might get stronger CPUs and our code will run faster (but we don’t have a way to know the relation between memory and CPU strength).
So the challenge is to find an optimal combination of memory and duration.
To achieve this, we’ve built a tool that executes many Lambda invocations with different settings. A setting can be the memory allocated for the invocation, but it can also be an applicative configuration, such as the size of the connection pool to the database (assuming our application uses a database).
For each setting, our tool executes multiple invocations to cancel out non-deterministic behaviors like momentary slowness of the network or warm-up time (which is the time it takes to spin up a new container to run your code in).
The output of the tool is the cost per invocation for each setting, and thus we could choose the optimal setting that reduced our cost-per-billion from $7 to $3.
Unexpected consequences of using Serverless infrastructure
We saw that thanks to using Serverless infrastructure, our application is highly scalable and can handle any kind of load.
However, unfortunately, our partners may not.
To avoid flooding them with too much data, we were asked to add a rate limit mechanism to control the amount of data we send them.
In the next blog post, I will elaborate on how this mechanism works.
Key takeaways — why Go Serverless?
- It allows you to focus on the business logic, rather than infrastructure considerations
- It can scale out and in very fast
- You get almost instant feedback about costs, which motivates you to optimize your code
- Not everyone is as scalable as we are ;)
Serverless infrastructure gives you great power and, with a few custom additions (such as alerts for early detection of unexpected behavior), you can have a very sophisticated and impressive application in a very short period of time.
I highly encourage you to adopt Serverless infrastructure in your organization.