Multithreading in AWS Lambda, Part 3: Multithreaded vs Multi-instance Lambda Architectures

6 min readFeb 3, 2023

In this “multithreading in serverless” series, we dive into multithreading in AWS Lambda, looking at implementation, scaling, and even comparing multithreading vs multi-instance architectures.

Previously, in Parts 1 and 2, we looked at experimental data to show awesome multithreaded workloads performance scaling in AWS Lambda using different memory size configurations, and then discussed how we can easily implement such multithreaded processing in Lambda.

Today, in Part 3, we’ll discuss multithreaded vs multi-instance Lambda architectures. When should we bother with multithreading, and when should we just rely on Lambda scaling out (automatically creating multiple instances) as needed?

Multithreaded Lambda architecture

A multithreaded Lambda architecture is what we’ve been discussing so far in this series. Here, your Lambda function is coded and configured to take advantage of parallel processing through the use of multiple processing threads. Your Lambda function can then process multiple independent data simultaneously, increasing your throughput. If we were to visualize it, it could look like this:

From any data source (like an event that sends a ton of data), we have a “fat” Lambda function with enough memory configured so that it has enough compute power to process the data simultaneously. In this visualization, I used 5, but it could be 4 or 6 or whatever is best for your use case. Notice that in this architecture, the data source or event also has to be “fat” — that is, the payload it sends has more than just a single piece of work to be processed. In essence, it is not just our Lambda function that needs to be configured and designed for parallel processing — even our events and data sources must be, too.

Multi-instance Lambda architecture

In contrast, a multi-instance Lambda architecture is typical of what you probably usually see. A Lambda function will just use a single thread to do its work. When there are multiple pieces of work to be processed, they are independent simultaneous requests, and multiple Lambda instances are created. We could visualize this architecture like so:

In this traditional Lambda architecture, neither our Lambdas nor our events are fat. Events and data sources send a piece of data meant for processing in a single core. A Lambda instance receives data to be processed and uses a single thread to process it. When there are multiple simultaneous pieces of work (in this example, I use 5 again), then multiple instances of that Lambda function is created to do the work.

Comparing multithreaded vs multi-instance Lambda architectures

The previous two diagrams show a comparable scenario — there are 5 distinct pieces of work to be done. They differ a lot in the architecture, though.

In diagram 1 (multithreaded architecture), a single event contains all of the 5 pieces of work, sends to a single Lambda function, and that Lambda function simultaneously processes the work using 5 threads.

In diagram 2 (multi-instance architecture), each distinct piece of work comes from an independent event or data source, is received by an independent Lambda function instance, and each Lambda function instance is much smaller and only has enough computing power for a single thread.

A note on “fat” events

At this point, you’re probably already familiar with making fat Lambdas (i.e., multithreaded Lambdas — we discussed how to implement them in Part 2 of this series), but you may not be sure about what fat events are and how to recognize them.

Since multithreaded Lambdas expect to process things essentially in batches (otherwise, what would the multiple cores be for?), then these Lambdas must be fed by events that can supply a batch of data.

If you have a Lambda function that is integrated with an API Gateway endpoint, and that endpoint simply accepts a web request (POST/GET/PUT/DELETE) from a typical web form (e.g., encoding an item into a web-based inventory system), then that event source essentially contains just a single piece of work (data will be received, processed a bit, then sent to a database — this is a serial and atomic task). Whether you configure your Lambda function to have 1 thread or 6 threads won’t matter, as only a single thread is typically used in this type of backend functions that just accept data for database encoding. This is not a type of event or data source that can supply a batch of data. It’s not a fat event source — it will only ever contain a single piece of work, and will only ever require a Lambda with a single thread’s worth of compute.

However, if your event or data source is an SQS queue, your Lambda function will automatically attempt to get 10 messages at a time. You can consider an SQS queue, then, as an inherently fat event source — it is meant to feed a Lambda function with multiple threads’ worth of compute.

(There’s a catch of course — if the SQS queue isn’t very active (i.e., there aren’t a lot of events that actually put items into the queue) at the moment, your Lambda function can end up receiving only 1 or 2 messages. If the Lambda function is configured to have 5 threads, then it will sometimes end up actually overprovisioned.)

Wrap up

If you only have basic needs, worrying about multithreaded vs multi-instance architectures for Lambda probably isn’t your main concern yet. Just defaulting to the simpler multi-instance Lambda architecture approach will solve all of your problems if you aren’t really dealing with a tremendous amount of activity or data.

For more complex needs, when the solution you are building or maintaining is going to handle a tremendous amount of activity and data daily, well, that’s when multithreaded Lambdas will eventually be a necessity. And then you’ll find that what you actually ultimately have is a multi-instance, multithreaded Lambda architecture: multiple simultaneous Lambda instances, each one using multiple threads to process a ton of data.

One important thing we haven’t touched yet is cost. When does it become cost-efficient (for your AWS bill) to use multi-threaded Lambdas, instead of just single-threaded Lambdas and relying solely on multi-instance invokes? That’s for Part 4 of this series, so stay tuned!

If you are interested in trying to figure this out yourself, here’s my GitHub repo containing multithreaded experiment results. This is the same link and data set as described in Part 1. You can use the raw data from there to derive some sort of cost approximation of multithreading vs multi-instance Lambdas.

A note on doing experiments: As an AWS Ambassador, I get hooked up with a decent amount of credits, exactly so I can do cool stuff like that experiment above. I also personally collect lots of AWS credits — a technique and tip I shared in a previous article about how I took and passed 5 pro-level AWS Specialty Certification exams back-to-back in a 3-day marathon. If you are actively trying to increase your cloud skills, I recommend you implement those tips yourself so you can do hands-on practice and experiments without having to shell out real money for your AWS bill.

As usual, if you’ve found this article helpful or interesting, make sure to clap the article a few times and follow me to tell the algorithm to show you more stuff like this and get notified. Thanks and see you soon!

Available articles in this series: