Understanding Multiprocessing in AWS Lambda with Python

Published in

Tech@Carnot

7 min readSep 7, 2020

Adding some transparency to the black box

Revisit this image after you finish reading this post

UPDATE: At the time this post was written, the maximum memory possible for an AWS Lambda function was 3008 MB. In December 2020, AWS increased this upper limit to 10240 MB. This doesn’t change the fundamentals explained in this post, and only gives increased flexibility to the users. You can figure out the number of vCPUs at each memory setting using the process followed in this post.

Let me start with an observation. I created a new AWS lambda function (with Python 3.6 runtime) and ran the following lines:

import multiprocessing as mpdef lambda_handler(event, context):
    return mp.cpu_count()

When I set the memory to 128 MB (the minimum possible), the return value was 2. I was quite surprised. I thought that perhaps I can run two processes in parallel with 128 MB memory. Then I set the memory to 3008 MB (the maximum possible). I got the same result. I thought that perhaps irrespective of the memory size, we can always run two processes in parallel on AWS Lambda. To verify my hypothesis, I designed a small experiment.

The Experiment:

I created two lambda functions using my favorite placeholder function, which returns the value of Pi accurately up to n decimal places, n being the argument. One lambda executes the function sequentially. Its implementation is shown below:

The other lambda splits the iterable range(max_iter) into n_chunks and runs the for loops in parallel in different processes, and then sums the outputs of the processes to report the final answer. Its implementation is shown below:

Note that we are using Pipes instead of Queue because, as mentioned in the Parallel Processing in Python with AWS Lambda blog,

Due to the Lambda execution environment not having /dev/shm (shared- memory for processes) support, you can’t use multiprocessing.Queue or multiprocessing.Pool.

Now with the two lambdas ready, I just needed to run them for different memory settings and compare the execution times to check if there was indeed any parallel execution happening.

I set the value of n to 4 for this experiment. Thus, both the functions were expected to output 31415.

The Results

The results, quite convincingly, showed that the output of mp.cpu_count()for 128 MB RAM was misleading. Before explaining further, let me show you the results first.

It is quite evident that parallel execution starts somewhere between 1.5 GB and 2 GB memory. To isolate the exact value, I performed more trials between 1.5 GB and 2 GB memory. The results are as follows:

Please note that the time in both the above tables is not exactly execution time. Seeing that the values are always a multiple of 100 ms, you would have guessed that it is actually the billing time.

The Inferences

Let’s list out the obvious inferences first

Parallel processing doesn’t start till ~1.8 GB of memory

This means that the Lambda provides only a single CPU until that memory level. The lambda with processes took more time than the sequential lambda perhaps because of the overhead introduced by the creation of processes, pipes, and subranges. Thus, mp.cpu_count() gave misleading results for 128 MB memory.

Increasing RAM beyond 1.8 GB is not beneficial for sequential execution

The general rule of thumb when using lambdas is that higher RAM gives you a higher CPU. This seems to be true in general. But to fully benefit from the higher CPU beyond 1.8 GB, you need to use parallel execution. The sequential execution time stabilizes after 1.8 GB RAM. If you have a purely CPU intensive task, then allocating more than 1.8 GB RAM without parallel execution will just lead to higher bills without any benefit. If you have network-dependent operations, you can still benefit from the increased network speed.

Now let us look at some more involved inferences.

1 full CPU = ~1.8 GB RAM is a good approximation for time estimation

Let us relook at the results again, but this time with a slight change. Instead of looking at the RAM, we will take the fraction of the CPU into consideration. If 1.8 GB RAM = 1 CPU, 0.9 GB RAM = 0.5 CPU and so on. The execution time for 1 full CPU is 4400 ms. Using that, we will find the expected proportional execution time for the other values of memory (4400/CPU Fraction) and compare it with the actual execution time. Here are the results:

As you can see, the sequential and parallel execution times match the expected time until 1.8GB, and then, the parallel execution time matches the expected time. Thus, the CPU is indeed proportional to the memory, and we get 1 full CPU at about 1.8 GB memory. AWS has also verified this in their official documentation:

Lambda allocates CPU power linearly in proportion to the amount of memory configured. At 1,792 MB, a function has the equivalent of one full vCPU (one vCPU-second of credits per second).

The max number of CPUs possible using AWS Lambda is ~1.68

This is a direct corollary of the above inference. Because we can have a maximum of 3008 MB of RAM, we can have a maximum of 1.68 CPUs. This means that we can achieve a maximum time reduction of 1.68x by converting a sequential process to parallel, provided that it is purely CPU-intensive. If it depends on I/Os and network, perhaps a further time reduction can be possible.

To be double sure, I tried the parallel execution with 4 and 8 chunks, instead of 2. However, there was no reduction in the execution time.

What does a fraction of a CPU mean?

This is a very valid question that you may be having at this stage. Physically, we can have CPUs only in whole numbers. Then how does lambda allocate a fraction of a CPU to our function? There are several posts that answer this question. You can refer to the post by Mustafa Akin. The crux is that fractional allocation of CPU is essentially the fractional allocation of time on a single CPU. So if a CPU spends 4.2 seconds in a minute for our function at 128 MB RAM, it will spend 8.4 seconds at 256 MB and all 60 seconds at 1.8 GB. Beyond 1.8 GB, we will have 1 CPU executing our function full time, and another CPU executing our function for a fraction of the time, depending on the allocated RAM.

What about threads?

You may be wondering why we just used process-based parallelism (using the multiprocessing module) and not thread-based (using the threading module). The simple answer is that we have restricted this experiment to a CPU-intensive function. With threads, Python’s Global Interpreter Lock or GIL would kick-in and thwart any efforts at parallelism. You can read more about Python’s GIL from the several articles that show up on a quick Google search. But the crux is that Python’s GIL would allow only one thread of a process to access Python’s interpreter at a time.

Of course, if you have a network-bound task, you can use threads for parallelism. A very nice post explaining threading in AWS Lambda can be found here.

The Bottomline:

If you have a purely CPU-intensive function and you wish to speed up its execution on AWS lambda, don’t even try till you have allocated at least 1.8 GB of memory to the function. The maximum time reduction that you can expect is ~1.68x. You need to calculate the GB-s of sequential and parallel execution to check if parallel execution is actually saving you any money. In many cases, sequential execution may actually be more cost-effective than parallel execution. For parallelization beyond 1.68x, you may want to deploy multiple lambdas within a workflow. See Step Functions in AWS.

I’d like to reiterate that this discussion is for CPU-intensive functions. If your function spends a significant amount of time waiting for inputs or is very network-dependent, you may actually benefit from deploying more than 2 processes/ threads.

References:

Parallel Processing in Python with AWS Lambda: https://aws.amazon.com/blogs/compute/parallel-processing-in-python-with-aws-lambda/
Configuring functions in the AWS Lambda Console: https://docs.aws.amazon.com/lambda/latest/dg/configuration-console.html
How does proportional CPU allocation work with AWS Lambda: https://engineering.opsgenie.com/how-does-proportional-cpu-allocation-work-with-aws-lambda-41cd44da3cac#:~:text=For%20instance%2C%20AWS%20allocates%20twice,faster%20than%20the%20128MB%20function.

We are trying to fix some broken benches in the Indian agriculture ecosystem through technology, to improve farmers’ income. If you share the same passion join us in the pursuit, or simply drop us a line on report@carnot.co.in

Follow Tech@Carnot for more such blogs on topics like Data Science and Visualization, Cloud Engineering, Firmware Development, and many more.