Tl;dr We dropped a tiny amount of AWS Lambda into our Rails app to get some much-needed job concurrency in a project with spiky demand and tight execution requirements.
We’re always interested in new technologies here at Expected Behavior, and serverless infrastructure is no exception. However, it can be really hard to put a new technology through its paces without the right project. It seems like every serverless article assumes you’re starting a new project. But if you’re like us, you probably aren’t starting new projects very often. That makes it hard to know when it’s the right time to add serverless to an existing application.
We first tried serverless around the time Amazon launched Lambda. We did a proof of concept of moving the core function of one of our products to Lambda, which was an instructive experience that taught us a lot about the technology and its edges, but it was ultimately a non-starter. The limits of Lambda just weren’t right for that workflow.
Fast forward a couple of years, and we found another chance!
We’ve been working on a huge new feature (still in private beta) for Instrumental that heavily relies on background jobs. When we first began planning the project, it became clear it would cause our job load to increase 10x using our standard background job infrastructure. Even worse, when those jobs were queued they’d all need to be worked within 1–2 seconds. The other 58 seconds of every minute wouldn’t have much going on. Standard horizontal scaling would work, but at a high cost. The new servers would be doing nothing most of the time, which is very inefficient and unnecessarily costly.
We began looking for another way. What we really needed was a system to work many thousands of jobs concurrently, but one that only costs us money when we’re actually using it. Essentially, the workload is “here are 10,000 I/O intensive things that all need to happen at the same time, then do nothing for awhile”. This turns out to be a pretty great fit for a serverless-function-in-the-sky.
Lambda seemed like a great fit for our needs, and we did a proof of concept to prove the technology worked as expected, but we found ourselves with an engineering decision: how much responsibility for this problem should we give to this new Lambda infrastructure?
Some important constraints:
- Lambda charges for both the amount of memory used AND the execution time.
- The workload for this project requires access to our standard relational database.
- The workload for this project requires API access across multiple disparate APIs. Request, wait, request, wait, request.
Since our goal was to reduce costs over traditional horizontal scaling, we needed to balance the costs of time between our usual infrastructure and new Lambda code. After some testing, we realized we could easily minimize cost and execution time by avoiding any need to access our relational database or standard application code from Lambda. We decided on a very explicit boundary: Lambda would only be responsible for API access. We’d build a payload of exactly everything needed to talk to the various APIs without any transformation in the Lambda code. Everything else would happen in our standard job infrastructure within our standard Rails environment. Lambda would need very limited permissions with limited access to secrets and could have a small memory footprint, while still giving us concurrency exactly where we needed it most.
We estimate a 90% cost reduction by moving just the API access to Lambda.
Invoking Lambda Functions the Simple Way
Part of what makes Lambda and other similar tools count as “serverless” is the ability to trigger functions via external inputs. With Lambda, that can be a variety of other AWS tools: SQS, Kinesis, Alexa, etc. Without those tools, you still need servers.
But we do have servers! We realized we can avoid complexity and maintenance overhead by simply invoking Lambda functions directly. In our case, that meant using the Ruby AWS SDK in our standard queue infrastructure. That’s as simple as:
Aws::Lambda::Client.new(credentials)lambda_client.invoke(function_name: lambda_function_name, invocation_type: ‘Event’, payload: lambda_payload.to_json)
Event invocation is fire-and-forget, while
RequestResponse is for when you want a synchronous response. We designed our Lambda jobs to be very failure tolerant, so we don’t need to care about the response. Using the
Event invocation type helped further reduce our costs because the jobs in our standard infrastructure could exit as soon as they invoked Lambda.
Frameworks, Tooling, Deployment
When we first dipped our toes into the water of serverless functions-in-the-sky, we did the sane thing and explored what tools and frameworks existed that could help us. Many exist, and they provide necessary and useful tools to get up and running with serverless…assuming you’ve got a blank slate. It makes sense: frameworks provide the most value for new projects. However, long-lived projects like ours already have lots of tooling, and our project just didn’t need the extra complexity.
Since we already had most of the tooling that serverless frameworks provide, we found they added more overhead than value. In the end, we found it simpler to just make small additions to our existing tools rather than install a whole new thing. Nearly the entire effort went towards developing code to deploy the functions, though even that is just 70 lines of fairly simple Ruby.
Gotchas, Caveats, and Lessons Learned
Here are a few specific things we learned the hard way.
- Lambda only allows up to 1000 concurrent invocations. If you need more than that, you’ll need to request it from Amazon support, which is not necessarily a fast process. You’ll also need be prepared to describe your use case to justify the increase.
- Serverless functions provide very limited insights into their faults. Make liberal use of console.log to add identifying information to the Lambda log (e.g, user IDs) to make it easier to search later. Above all, make sure to use your application monitoring tool or you may never know when your Lambda function malfunctions. If you don’t have an application monitoring tool, we know a good one. :)
- You’ll want to have a deployed version of each Lambda function for each active developer for both development and test. Expect increased cycle time in development because of function deployment.
- Write automated tests that actually invoke your Lambda functions in the Lambda infrastructure. You’ll be glad you did.
We’re still beta-testing our big new feature, but our use of Lambda has already proven to be a huge win. It’s saved us money in both server costs and engineering time, and the maintenance overhead of adding a new environment has been comparably low. Best of all, we have a much better understanding of the kinds of problems that would benefit from a sprinkling of serverless in an existing serverful project.