Elixir and AWS Lambdas for extracting CPU and memory intensive work

by Faizaan Shamsi

We use Elixir and Phoenix for the majority of our backend work. It’s a great tool for a variety of our use cases with usually center around APIs, Web applications, and background processing. Coming from a Ruby/Rails background, the performance out of the box is a big improvement, especially when it comes to IO-bound tasks. It’s quite often enough for all our backend needs when paired with Postgres.

We find ourselves needing autoscaling less often as running 2 (or 3) servers is far more than enough to serve web/API requests and have some redundancy for small projects. It’s also simpler devops-wise to have a fixed number of servers when connecting the nodes in an Erlang cluster.

A recent application we built needed to be able to do some image processing, and there could be quite a few concurrent jobs running at a time. We found that the baseline CPU/memory load from API requests rarely varied, but memory and CPU usage due to the image processing had peak loads an order of magnitude higher.

A couple of solutions came to mind:

  1. More servers
  2. Extract all the image processing to a separate service
  3. Combination of (2) and autoscaling
  4. AWS Lambdas (or your favorite flavor of serverless function)

(3) seemed like the best option, but it was going to add a bit of complexity to devops in terms of managing, scaling, and networking another service which we wanted to avoid. The jobs we wanted to extract had a few criteria that made AWS Lambdas a good option:

  1. Placed high demands on CPU and memory but only for a short time

2. Needed to be able to run many jobs in parallel

3. Had an input and output but otherwise stateless and not dependent on anything else

4. Better library support in Java/Python/JS than Elixir

We decided that spinning up a lambda for each image processing job would be the best option. In terms of managing application servers we’d only have to worry about our API. In terms of scaling the Lambdas could handle way more requests concurrently than we would need, and then our Phoenix API would be under drastically lower peak loads and we could avoid more servers or autoscaling. There were 3 parts to this:

  1. A trigger for the lambda (in our case an SES notification, but it could also be an API Gateway webhook)
  2. The actual lambda function, which processed the image and uploaded it to S3, then sent our API a POST with the URL and other metadata
  3. An API endpoint in our Phoenix app that took the URL and did the rest of the transaction

This has worked excellently for us as it dramatically smooths out our CPU/Memory usage, simplifies devops, and keeps our server costs low since we’re not provisioning unnecessary resources. Our main pain points were in setting up a good test case to trigger the lambda, setting up good logging to be able to debug the lambda, and making it easy to deploy changes.