Arcadia had a little Lambda: A fleece not quite as slow

Published in

The Arcadia Source

8 min readJun 14, 2022

Photo of sheep grazing around an array of solar panels, located in a grassy field

As the old saying goes, there’s more than one way to shear a sheep. When it comes to running code, there are even more. To properly containerize, deploy and orchestrate powerful codebases, you’re going to need some equally tough scissors. In this article, I’m going to talk about one such instrument: AWS Lambda, Amazon’s serverless computing service. I’ll explain how and why Arcadia incorporated this service into its data collection solution and highlight the considerations, benefits, and limitations to using this technology.

Here at Arcadia, part of our mission is to be a catalyst for the clean energy future by providing a meaningful, substantial, and seamless integration into an otherwise complicated technological landscape. As part of that effort, we operate as a data connector between utilities and consumers by organizing information such as power consumption, tariffs, and billing information. Accessing data from third-party services is a challenging task that necessitates a well-designed and flexible engineering approach. To add to that, the environments within which we collect information are fluid, dynamic, and often produce a variety of input streams and data formats that must be standardized and normalized to conform to our highly scrutinized data standards.

Our initial solution, outlined below, was successful at mitigating the inefficiencies and constraints of responsive data consumption. Over time, however, as the scope and scale of our data access domain grew, we began to see signs of stress in our application that would need to be addressed.

In with a lion

Our initial approach took the form of a single monolithic service deployed inside containers on Amazon’s ECS. At its inception, our application was responsible for a small number of information sources and managed a relatively simple data model. Our solution performed well given its initial purview; however, as the company scaled horizontally by supporting more utilities, it became clear that the design of our original system would need to be revisioned to address the following limitations:

It was slow. Under the hood, our initial data connection method was monolithic and required a lot of resource overhead. Because we were confined by our existing Rails environment, we opted for out-of-the-box tools and libraries because of their compatibility with Rails. However they lacked speed and the ability to be customized or modified without significant investment. As the volume of network requests made by our application increased, it became more and more burdensome to try and manage unwieldy tools.

It became inflexible. Our initial development environment did work very well for the original scope of its domain; with a smaller scale of utilities that shared relatively limited uniform data needs, our application was streamlined and performant. As Arcadia’s footprint in the utility data coverage space grew, though, it introduced a diversity in both the formats of data and the requirements for establishing connections to that data. Signs of stress became apparent.

For instance, a web request could complete successfully, but background requests with necessary data could be dropped, corrupted, or uninitiated with limited visibility. We began to realize that our application would need specialized logic to detect missing data on a per-utility basis as well as the ability to retry requests if expected data didn’t return.
Since all other collection events were queued synchronously behind each other, we had real application reliability and scalability concerns as the number of scheduled collection events increased.
Furthermore, as JavaScript continues to dominate the modern web communication model, it introduces an asynchronicity that has repeatedly required us to develop specialized logic to rely less on traditional indicators of success and use more precise data introspection.

Meme showing an alpaca whose body is shaved but whose head is not. A line is drawn horizontally below the alpaca’s head, with the words “FRONT END” above the line, and the words “BACK END” below the line.

Out with a Lambda

The limitations above led Arcadia to adopt AWS Lambda as a solution. Some of our motivations for choosing Lambda were probably ones for which Amazon specifically designed its service, but there are a few that might surprise them. They market Lambda as a “serverless, event-driven compute service” with one of the main benefits being that you don’t have to pay for a dedicated, always-on server if you only need to perform short, efficient tasks triggered by a certain event.

Ultimately, Arcadia picked Lambda — less for the cost-savings of event-driven invocation but more for its ability to seamlessly improve our application environment with better flexibility and scalability.

Faster and quicker

Lambda lets you package code and dependencies as a .zip file or container image. This makes deployment rather easy since you can set up, test, and debug codebases locally and then simply ship and bundle a fully self-contained app once finished. Lambda also offers Lambda Layers which allow you to share files, libraries, or other resources across multiple Lambdas. For instance, if two separate Lambdas both use a Chromedriver, which is quite large, you don’t need to package it independently in each Lambda — they can both share a Lambda Layer.

This freedom allowed us to iterate faster and reduce our reliance on a single coding language. Python is a highly-proven, well-documented language for data related tasks. It also has a large number of highly specialized libraries available. AWS Lambda allowed us to seamlessly migrate Python while our main application remained in Ruby.

There is, however, a 10GB file-size limit to Lambdas, so large libraries can be a deployment obstacle. Lambda Layers are meant to alleviate some of those concerns, but there is still a limitation here that does not exist in other tools like Kubernetes.

With great flexibility comes great responsibility

The most impactful advantage that Lambdas afforded us was the ability to write separate, individualized codebases, containerize them as distinct data connectors and deploy them independently of each other. Instead of routing requests based on the nature of the error our application encountered, we could intelligently route requests based on a precise identifier for a certain website. Our main application is now able to pass a specific service identifier to our data collection environment which describes exactly which data source to access and which methods to use. Previously, our data access application would begin each time by executing the same boilerplate code; it was only able to determine which specialized methods to use based on the error code it encountered. This is no longer the case — specific identifiers now govern code execution, enabling a more intelligent control flow and more precise error logging.

Errors with one input stream no longer affected the performance or availability of other connectors; we could fix, teardown, and redeploy connectors independently of one another. Where there was once a brittle, monolithic application, there are now individual Lambda functions for each domain to which we integrate. This means developers no longer have to consider the entire codebase when fixing bugs or developing features. This is a huge value-add to our organization as it not only allows our developers to move more quickly but also this separation and isolation of concerns means code review is more concise and less complicated.

Graphic depicting the following quote: “The most impactful advantage that Lambdas afforded us was the ability to write separate, individualized codebases, containerize them as distinct data connectors and deploy them independently of each other.”

Our current data access solution presents some unique requirements that Lambda isn’t fully prepared to handle, though. For instance, prioritization and queueing are important capabilities of event-based code environments. AWS does have product offerings to provide a flavor of these implementations; however, the nuances and specific features of our setup make out-of-the-box solutions insufficient for our use.

The inability to proactively rate limit a Lambda function also forced us to create ad-hoc solutions* for both scenarios. For the former, a custom queueing service was created. For the latter, we use concurrency limits as a workaround throttling solution. It’s also difficult to predetermine the IP address of your standard Lambda function. So when certain services require allowlists, we have to devise some additional networking provisions to provide static IP addresses for them.

*Be on the lookout for future blog posts detailing those solutions!

Looking back, and ahead

Some of the concerns above have made us justifiably question whether AWS Lambda is still the best approach for our data connection efforts. At the time we selected Lambda, we didn’t really consider other orchestration solutions like Kubernetes — partly because AWS was already part of our infrastructure and partly due to the fact that managed Kubernetes services weren’t developed to a point in which they would fit our needs without significant customization.

Another motivation for choosing Lambdas was our desire to create utility-specific Lambdas that fully isolated and encapsulated specialized data operations. Over time, though, Arcadia continuously developed a proprietary shared library of common scraping operations, which has proved integral to our data validation standards. The end result looks less like an actual application and more like a library like pandas, containing a suite of data transformation functionality to be used across all connectors. Isolating domain-specific code into individual Lambda functions has certainly provided strategical benefits, though the degree of practical separation is quite less than we initially forecasted.

One final consideration (at least, with respect to other orchestration tools) is that we ended up using Kubernetes for other services in our environment. There are tradeoffs to using a highly-managed third-party solution like AWS; full control and custom configuration are sacrificed to some degree in the interest of speed and simplicity. These tradeoffs are fine, so long as your ecosystem is expected to remain predominantly in AWS. However, there are added complexities with getting Lambdas to talk to other products such as Kubernetes clusters. One might go so far as to say this AWS product offering is a sacrificial Lambda.

Rounding up the herd

Sometimes engineering problems can be so nuanced and prescriptive that the broader businesses needs are obscured by technical specificity. Even the most precise and technical engineering projects are ultimately derived from a business goal or need. From an operational perspective, Arcadia will keep striving to expand the breadth of our utility coverage. At the same time, we’re also responsible for ensuring that as the scope of our coverage grows, we maintain efficient, accurate, and reliable data access capabilities. Both the introduction of Lambdas and the ingenuity of Arcadia’s engineering department to use them appropriately have important tie-backs to these business objectives. For instance, we added orthogonality to our request orchestration model with Lambdas, which enabled us to widen our utility coverage. We also significantly improved the speed and performance of our entire data access solution by abandoning slower tools and leveraging purposeful Python libraries.

AWS Lambda offers a compelling combination of flexibility, scalability, and easy integration. Lambdas afforded us the ability to break apart an error-prone, monolithic application into smaller, highly-manageable functions that are specialized to handle distinct tasks and data. There are, however, cognitive challenges from a developer perspective and technological limitations to be considered when using AWS Lambda; build size limits can create deployment problems, and the complexities of an ephemeral computation service make it hard to create reproducible network outcomes and communicate with other systems. If faced today with the task of having to rebuild our infrastructure, some of these factors could potentiate a decision to use Kubernetes over Lambdas. That being said, at Arcadia we continue to diligently and thoughtfully weigh the costs of a rewrite to transition to Kubernetes against the benefits it would yield. As of this writing, we still opt for Lambdas.

TL;DR — anyone considering using AWS Lambda would do well to weigh the benefits of execution flexibility with the complexities of managing transient third-party infrastructure.