Introducing LambCI — a serverless build system

I’m excited to announce the first release of LambCI, an open-source continuous integration tool built on AWS Lambda 🎉

LambCI is a tool I began building over a year ago to run tests on our pull requests and branches at Uniqlo Mobile. Inspired at the inaugural ServerlessConf a few weeks ago, I recently put some work into hammering it into shape for public consumption.

It was borne of a dissatisfaction with the two current choices for automated testing on private projects. You can either pay for it as a service (Travis, CircleCI, etc) — where 3 developers needing their own build containers might set you back a few hundred dollars a month. Or you can setup a system like Jenkins, Strider, etc and configure and manage a database, a web server and a cluster of build servers .

In both cases you’ll be under- or overutilized, waiting for servers to free up or paying for server power you’re not using. And this, for me, is where the advantage of a serverless architecture really comes to light: 100% utilization, coupled with instant invocations.

Systems built on solutions like AWS Lambda and Google Cloud Functions essentially have per-build pricing. You’d pay the same for 100 concurrent 30 second builds as you would for 10 separate 5 minute builds.

The LambCI Advantage

From an ops perspective, all of the systems and capacity are managed by Amazon (SNS, Lambda, DynamoDB and S3), so LambCI is far simpler to setup and manage than Jenkins — especially given that you get 100 concurrent builds out of the box.

From a cost perspective, it’s typically far cheaper for private builds than the various SaaS offerings because you only pay for the time you use (and the first 4,444 mins/mth are free):

(Assumes 7 days/wk — with LambCI running on fastest 1.5GB Lambda option)

So if you had 2 developers, each simultaneously running sixty 4-min builds per day (ie, 4 hrs each), LambCI would be more than 8 times cheaper per month than Travis ($15 vs $129).

It’s only if you need to be running builds 24/7 that SaaS options become more competitive — and of course if you’re wanting to run builds for your open source projects, then Travis and CircleCI and others all have great (free) options for that.

Performance-wise, Lambda reports as a dual Xeon E5–2680 @2.80GHz. If you have checked-in dependencies and fast unit tests, builds can finish in single-digit seconds — but a larger project like dynalite, with 941 HTTP-to-localhost integration tests, builds in about 70 seconds. 43 secs of that is actually running the tests with the remainder being mostly npm installation. On my 1.7GHz i7 MacBook Air the npm install and tests complete about 20% faster, so there’s definitely an element of “cloud” speed to keep in mind.

The public Travis option takes only a few seconds longer than LambCI to run dynalite’s npm install and tests, but the overall build time is larger due to worker startup time (22 secs) and waiting in the queue (up to several mins — I assume this only happens if you don’t have enough concurrency).

What does it look like?

Here’s what it looks like in action — this is building a project with only a handful of tests and checked-in dependencies, so this is definitely faster than it is when building our typical projects, but I promise this is real and all running remotely on AWS Lambda:

Build time includes DB lookup, git cloning, etc — Amazon’s network is fast!

It comes as a CloudFormation stack that will deploy quickly (about 3 mins) and cost you nothing when you’re not using it.

Setup everything during stack creation thanks to Lambda-backed Custom Resources

The stack consists of:

  • an SNS Topic to listen to GitHub events and forward to Lambda
  • a Lambda function with a bundled git binary to clone the PR/branch, run the build, update Slack and GitHub, and store the results
  • two low-capacity DynamoDB tables for config settings and build results
  • an S3 bucket to store HTML pages of the results and any other build artifacts (optional)
  • a bit of IAM glue
Boxes and arrows! Must be an architecture diagram

There’s also a command-line tool to perform setup and configuration — so you don’t need to manage everything from the AWS console if you don’t want to.

No API Gateway?

Nope. Not… yet, anyway. Having SNS as the sole entry point means that you have a well-defined surface area which you expose to the world — and a single user who just needs permissions to publish to an SNS Topic. It’s entirely possible that API Gateway endpoints will be added in the near future to enable a richer UI, but for now it definitely makes the stack simpler without it.

There’s gotta be a downside?

There are definitely some limitations that may be showstoppers for you, depending on your requirements. The two largest in my opinion are:

  • no root access
  • 5 minute max build time

The latter may be something that AWS extends — also, given there’s such a low barrier to concurrent execution, this limit encourages you to split up your builds into parallel jobs. Making this more straightforward is definitely on the list of features for LambCI v1.0.

In terms of root access, this means you cannot run any software that requires root (eg, Docker), or install software in default system locations. You only have access to /tmp, so any extra tools need to be able to be installed in non-standard locations. A surprising number of tools can be installed in /tmp, and there’s a growing collection of recipes for how to get them running in Lambda/LambCI.

Containers to the rescue

Given not every project can fit within these limits, LambCI has the optional ability to run build tasks for particular projects on an ECS cluster.

Now hang on a minute, I hear you well-actuallying, that’s not serverless! Of course, you’re absolutely right. However, there is still a huge advantage in the fact that the instances in the cluster are stateless and homogeneous — they all run the same stock-standard Amazon image and they can be spun up or down whenever you like, so the maintenance overhead is still very low. You can have zero instances running whenever you don’t need them, and you can auto-scale them based on time of day or current load.

The lambci/ecs project has a stack with a task that will look for a Dockerfile.test file in the cloned repository, and build and run all the commands specified in that Dockerfile (Docker-in-Docker!) This makes it very straightforward to specify all of your dependencies, leverage Docker’s layer caching, use any language you want, run the build for as long as you want, and have root access in the container the build is running in.

Here’s how that setup looks:

LambCI with a dash of ECS

The road to v1.0

LambCI is feature-complete inasmuch as it can respond to GitHub events, clone repositories, run build commands and update GitHub and Slack statuses. It can run different versions of Node.js, Java, Go, Ruby, PHP, Rust, Python 2.7, native compilation with gcc, and tools like phantomjs for automated UI testing.

However, there are a few features that would be great to have for an impending v1.0 release, a number of which just boil down to “what’s the right incantation to get this to work”:

  • Recipes on how to build other languages — other versions of Python and other languages will probably work just fine too, they just need to be tested.
  • More solidified configuration on how to run parallel builds.
  • AWS CodePipeline integration for continuous delivery.
  • Support for other notification services — as well as Slack, LambCI can publish statuses to an SNS topic, so email and SMS are already covered, but it might be nice to support services like HipChat, Yammer, etc out of the box.
  • Support for other repository sources like BitBucket, GitLab, AWS CodeCommit, etc — although this is more likely a post-v1.0 goal.
  • Support for running on other cloud services like Google Cloud Functions and Azure Functions — probably also post-v1.0 goals.
  • A hosted service with pay-per-build pricing — for those who don’t have/want an AWS account and want to get up and running easily, with the ability to move to their own LambCI stack with the same configuration if they wanted later.

The future of serverless ops is bright

LambCI is just one example of the sort of tools that are now possible to build without needing to wait for servers, instances, dynos, etc to start or worry about keeping them running and up-to-date.

As container-based systems like OpenWhisk become more production-ready, we’ll start to see even more flexibility in this space— who knows, maybe AWS will offer a way to run containers on Lambda too.

So take LambCI for a spin, hit us up on Twitter and GitHub, let us know if there are any features you think might be great to add. I’d love to get some community feedback on what works and what doesn’t, what languages people want to see supported, and what interesting ways we can push our automated build setups to take advantage of this newfound concurrency!

Enjoy!


Many thanks to Jed Schmidt and Tom Dale for nudging me to get this out and providing feedback on this post 🙇


PS: Hating on “serverless”?

Well. Look. I’m not going to defend it to the death, but I don’t think it’s anywhere near as bad as some suggest. It’s a term people are using to describe architectures in which you don’t deal with anything resembling a server, or an instance, or similar.

Think of the term as being akin to “stateless” in a “stateless architecture” — of course the underlying infrastructure has state — and no one would pretend otherwise, just as no one is suggesting there’s not literally physical servers powering Amazon, Google or Microsoft’s serverless products— it’s just that you don’t deal with anything that resembles one and it doesn’t appear in a logical representation of your system.

Mike Roberts has a great write-up over at Martin Fowler’s site on the whole landscape which I think lays things out a little clearer for those who are new to the space.