We Made the Whole Company “Serverless”

CloudSploit
CloudSploit
Published in
11 min readJul 6, 2016

As a technology company, the concept of not running or managing any servers may seem completely foreign at first. For network and operations professionals, who have built their entire careers around managing servers, it may seem downright ridiculous. However, given the recent trend towards “serverless” computing in the cloud (the term itself is ripe for passionate Internet-forum debates), the reality of not maintaining servers is becoming more and more possible.

Today, CloudSploit is running its entire infrastructure consisting of backend tasks, cron-jobs, compute intensive scans, front-end API, public website, and more, without a single EC2 instance or other self-managed server. The purpose of this article is not to invoke the fury of the traditional systems administrator, nor rile up an Internet debate on the merits of the word “serverless” (yes, we know there is still a server, somewhere, behind our product). Instead, we’d like to share what we’ve done, why we made the decision to invest in these newer styles of computing, and prove that it is entirely possible to design complete applications without having to manage any servers.

This view still surprises us sometimes

Deciding to Avoid Servers

For a small company focused on rapid growth, the last thing we wanted to deal with was disk space and memory management, logging agents, security patches, operating system updates, and other traditional server management processes. These may sound like cliché complaints, lifted from the Lambda product page, but countless hours of time have been lost finding the perfect logging and monitoring solutions and updating AMIs each time a new security patch was announced. In addition, we realized that we were spending a lot of money running servers 24/7, even when those servers were not receiving traffic. Auto-scaling works, but you can’t feasibly scale from 0–1–0 during off-peak hours.

Around the time that our product began to gain traction, AWS Lambda and the API Gateway had been released and were becoming important parts of the AWS ecosystem. While there were still some missing features (namely the API Gateway’s lack of easy import/export — now fixed, confusing interface, and occasional bugs), we elected to begin converting our traditional, EC2-server-backed environments to these new “serverless” technologies.

This was not an easy decision. Besides worrying that we were simply falling victim to the latest industry buzzword, our primary concern with using these technologies was vendor and product lock-in. What would happen if the API Gateway was (unlikely) discontinued next year? Suppose we suddenly required unsupported features (more likely)? However, through careful development, which we’ll expand on shortly, we were able to address these concerns and design our application in such a way that conversion back to the traditional model would be simple.

Taking Stock

Once the decision was made to convert, the first step was to audit the existing infrastructure and determine which pieces would have to be rebuilt for the new platform.

At the time, the compute components of our environment consisted of:

  • An EC2 server farm of instances running the background scanning portion of CloudSploit.
  • An auto-scaling group of servers behind an ELB running our customer dashboard.
  • An auto-scaling group of servers behind an ELB running the front-end website, https://cloudsploit.com
  • Various EC2 servers running background and cron tasks
  • EC2 NAT servers
  • RDS instances for databases
  • A VPC with public/private subnets across multiple availability zones

Some of our services had already been completely built from the ground-up on Lambda (see our previous blog post). However, both our public-facing and our customer dashboard websites were running as dynamic sites (Node.js Express) on traditional EC2 servers with some static assets in S3. The move away from servers for these applications was the most challenging, since it required a complete rewrite of the API logic to support a static-site format. Additionally, we had a variety of servers processing background tasks such as cleanup scripts, email reminders, etc. Each of these scripts had to be re-written in the event-context style of Lambda. Despite the need to rewrite these applications, the upfront time investment has more than paid for itself.

I’ll include some spoilers here as a way of providing a short summary of what will be covered next. Now that our conversion is complete, our infrastructure consists of:

  • S3 buckets for statically-hosted sites
  • CloudFront distributions in front of each site/bucket
  • An API Gateway endpoint
  • Numerous Lambda functions for the API, cron tasks, cleanup scripts, and scans
  • A managed NAT server
  • RDS instances for databases
  • A VPC with public/private subnets across multiple availability zones

Creating a Static Site

Much has been written on creating single-page and static web applications, so I will not re-write that book here. However, converting from our Express/Jade template engine to pure HTML/JS was perhaps the most time-consuming portion of our conversion as every page had to be re-written. Once completed, we had two new S3 buckets: cloudsploit.com (for our public-facing website) and console.cloudsploit.com (for our customer dashboard).

At this stage, we found that S3 does not provide direct support for “pretty” URLs (such as cloudsploit.com/features instead of cloudsploit.com/features.html). S3 does provide path rewrite rules, but we discovered that a simpler method was to simply upload an HTML file without its extension, but with its content type.

Once everything was uploaded to S3, we added a CloudFront distribution in front of it, allowing us to access our site via our root domain, as well as adding HTTPS support. Another nice feature that was released around this time was Amazon Certificate Manger (ACM). We ditched our Comodo cert in favor of these certificates which were much easier to use, don’t require renewal, and integrate seamlessly with CloudFront.

Creating the Lambda Backend

At this point, we had our customer dashboard entirely hosted on S3, but it didn’t do much of anything. We set about creating the backend functionality, which we decided to add via the API Gateway and Lambda. However, we were extremely careful to develop this backend service in such a way that it could be run from an EC2 server (or in Azure, our own datacenter, etc.) if needed. One project that was beginning to gain traction around this time was JAWS (now “Serverless”). However, it had not reached a 1.0 release at the time, and we were looking for something that could easily plug-and-play with a traditional Express framework.

To do this, we created a new Node.js project with the following layout:

├── config        // Site-wide config settings
├── controllers // Methods for handling each route
├── helpers // Helper scripts for logging, responses, etc
├── index.js // The main entry point
├── models // Database models
├── node_modules
├── package.json
├── routes.json // Declaration of all available routes
├── schemas // JSON schema declarations for each POST/PUT
└── tests

This looks identical to every other Node.js API project we create. The only difference is in the index.js file. Instead of starting an Express server listening on a specific port, we process the Lambda event object.

var routes = require(__dirname + '/routes.json');
var logger = require(__dirname + '/helpers/logger.js')('info');
var responses = require(__dirname + '/helpers/responses.js');
var routesMap = {};// The routes.json file contains a mapping of controllers
// to methods and handlers.
for (controller in routes.api) {
for (method in routes.api[controller]) {
var verb = routes.api[controller][method].verb;
var path = routes.api[controller][method].path;
if (!routesMap[verb]) routesMap[verb] = {};
var controllerToAdd = require(config.controllerPath + controller + '.js')[method];
routesMap[verb][path] = routes.api[controller][method];
routesMap[verb][path].controller = controllerToAdd;
}
}
exports.handler = function(event, context) {

// Several event validation checks have been removed here
// for clarity
// The request is also validated against the route schema
// here as well
var httpMethod = event.context.httpMethod.toLowerCase();
var path = event.context.resourcePath;
var identity = event.context.identity;
var stage = event.context.stage;
routesMap[httpMethod][path].controller(event, context, responses, logger);
};

Notice that the loading of the “routesMap” and other helpers is done outside of the event handler, while the actual event processing is done inside. This is done so that the controllers are only loaded into memory once, while the actual request is freshly processed each time that Lambda is invoked. Lambda’s memory management is a bit tough to grasp at first, but the general rule of thumb is: at the same scale, new invocations of the same Lambda function will trigger a “cold” boot (clean memory) if the requests are greater than ten minutes apart. Less than that, the same Lambda function is re-invoked from a “hot” state.

If, in the future, we decided to leave the API Gateway and Lambda, we could easily re-write this index.js file to process the HTTP event in the context of an Express or other framework’s server.

Next, we added functionality to the project, by creating controllers for each of the paths we wanted to process. For example, the “plugins” controller looked like this:

var errors = require(__dirname + '/../helpers/errors.js');
var config = require(__dirname + '/../config/api.js');
module.exports = {
getAll: function(event, context, responses, logger) {
// Lookup all plugins in the database
responses.succeed(context, results || []);
// If there was an error:
logger.error('Error listing plugins: ' + err, event);
errors.send(responses, context, errors.INTERNAL_ERROR);
},
getOne: function(event, context, responses, logger) {
var pluginId = event.params.id;
// Lookup one plugin in the database// If found:
responses.succeed(context, result);
// If not found:
errors.send(responses, context, errors.NOT_FOUND);
// If there was an error:
logger.error('Error listing plugin: ' + err, event);
errors.send(responses, context, errors.INTERNAL_ERROR);
}
};

Additional methods, such as “create,” “update,” and “delete” were added as needed.

The overall goal of our project design was to obtain the resource, HTTP method, body, and parameters from the API Gateway event (passed to Lambda) and process them with the corresponding resource controller.

Creating the API

Due to the strict expectation that the event object passed to Lambda followed a specific format, we had to carefully manage the API Gateway’s handling of requests. At this point, we realized that most AWS documentation for the API Gateway suggested creating new routes for every resource (/dogs, /cats, /turtles, etc.). However, this would become incredibly tedious for a large project with scores of resources. Instead, we opted to make the URL path a parameter.

Instead of creating /plugins and then having POST, GET, etc. below it, we made {resource} and {id} part of the URL params that would be passed to Lambda. This meant that a request to GET /plugins/1 would be translated as “resource=plugins; id=1” which could then be used by Lambda to locate the correct controller (“plugins” shown above).

The downside of this approach is that we lost the ability to do event transformations on the API Gateway side, thus meaning that we could see an increase in Lambda requests, were we to see a surge in invalid traffic.

The next thing we needed to do was define how the API Gateway would transform an HTTP request into an event object used to invoke Lambda. We used the following code (which we obtained from the AWS Forums, and looks, admittedly, a bit complex) to handle the processing and convert the request into the format we expected in Lambda.

{
"context": {
"httpMethod": "$context.httpMethod",
"resourcePath": "$context.resourcePath",
"stage": "$context.stage",
"identity": {
#foreach($param in $context.identity.keySet())
"$param": "$util.escapeJavaScript($context.identity.get($param))" #if($foreach.hasNext),#end

#end
}
},
"body" : $input.json('$'),
"headers": {
#foreach($param in $input.params().header.keySet())
"$param": "$util.escapeJavaScript($input.params().header.get($param))" #if($foreach.hasNext),#end
#end
},
"query": {
#foreach($param in $input.params().querystring.keySet())
"$param": "$util.escapeJavaScript($input.params().querystring.get($param))" #if($foreach.hasNext),#end
#end
},
"params": {
#foreach($param in $input.params().path.keySet())
"$param": "$util.escapeJavaScript($input.params().path.get($param))" #if($foreach.hasNext),#end
#end
}
}

All this code is doing is pulling out the applicable pieces of the HTTP request, and then creating an event object for Lambda. In the end, a GET request to /plugins/1 will look like:

{
"context": {
"httpMethod": "GET",
"resourcePath": "/v1/{resource}/{id}",
"stage": "prod",
"identity": {
"sourceIp": "10.1.1.0",
"accountId": "12345678901",
"cognitoIdentityId": null,
"cognitoIdentityPoolId": null,
"cognitoAuthenticationType": null,
"cognitoAuthenticationProvider": null,
"userArn": "arn:aws:iam::12345678901:test-user",
"userAgent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"
},
"body": {},
"headers": {
"X-Access-Token": "sometokenhere"
},
"query": {},
"params": {
"resource": "plugins",
"id": "1"
}
}
}

If the request were a POST or PUT, the body would be filled in above as well.

Cron Tasks and Helper Scripts

As mentioned above, a large portion of our infrastructure also consisted of background tasks and cleanup scripts which were frequently scheduled via cron. One example was a script which located users who had not confirmed their email address and sent them a reminder. Scheduling these functions in AWS Lambda is very simple:

From the Lambda console, we can schedule timed event sources, allowing our functions to execute at defined intervals.

Putting It All Together

To get these pieces deployed as part of a complete application, we needed to update the front-end code to communicate with the new API Gateway endpoint, give the API Gateway permissions to invoke our Lambda function, deploy the Lambda function in the VPC (it communicates with an RDS database), and create IAM roles for each as needed. Additionally, we deployed each of our other Lambda functions, and finally, once we were ready to make the switch from our EC2 environment, pointed our DNS records at the CloudFront distributions instead of ELBs.

Testing new features in a staging environment and making updates is very easy. With the API Gateway, we can define “stages” through which we can promote and test changes. These stages can also invoke different versions of the back-end Lambda function, which in turn can contact different back-end resources (such as our staging database).

Final Notes

One of the most dramatic changes we have noticed since our switch to a completely “serverless” architecture has been in our billing statement. As I mentioned, previously we were paying for servers 24/7, regardless of whether they were actively serving requests. Now, we only pay on a per-request basis. The following is a quick breakdown of the cost of each of the components (from the us-east-1 AWS pricing pages):

  • S3: $0.004 per 10,000 requests
  • CloudFront: $0.085 per GB served; $0.0075 per 10,000 requests
  • Lambda: $0.20 per 1 million requests (First million free)
  • API Gateway: $0.09 per GB served; $3.50 per million requests

As you can see, this pricing is incredibly cheap. We could load a 20 KB web page 1 million times on our site (assuming it makes 1 API call which in turn invokes Lambda 1 time and returns a 10 KB response) for $6.46 (S3: $0.00 — CloudFront only makes a few requests to the origin; CloudFront Data: $1.62; CloudFront Requests: $0.08; Lambda Requests: $0.20; Lambda Execution Time: $0.20, API Gateway Data: $0.86; API Gateway Requests: $3.50). While there is certainly an argument for calculating the exact point at which EC2 vs API Gateway and Lambda is more cost effective, we believe that the latter’s ability to scale “infinitely” almost immediately makes it worthwhile.

Another point worth mentioning is that logging with Lambda via CloudWatch is not the best experience. We recommend using a third-party tie-in such as Sumo Logic’s new Lambda app.

Finally, this setup is not for everyone. We have evaluated the tradeoffs and determined that, for our needs, this solution works well. However, we have given up some features that some organizations may hold as strict requirements. For example, we do not have access to the underlying OS running our Lambda functions. We cannot install any custom agents, log drivers, etc. Additionally, if AWS experiences downtime in either the API Gateway or Lambda services, we are at their mercy for service restoration (although multi-region Lambda deployments are possible).

Almost every organization likely has a few services running on AWS that could be converted to a more “serverless” design. We encourage you to take a look at your environment to see if there are any you could migrate, saving you the headaches associated with traditional servers. And to be honest, who wouldn’t enjoy one less “the server is out of disk space” warning at 3 AM?

CloudSploit is a provider of open source and hosted AWS security scanning software to detect potential risks and misconfigurations in cloud infrastructure environments. To contact us about this article or our service, email us at support@cloudsploit.com.

--

--