Serverless For Devops Teams
Serverless as a technology is no longer in its infancy. It has matured to a point where we can at least say it is in its lanky teenage years, yet to enter the full flushes of adulthood, but with all its vitals shaping up nicely.
We in the DevOps team here at Space Ape are avid Serverless fans. This post aims to explore the factors that have made it so appealing to us, from an operations standpoint.
Serverless has enjoyed a rapid rise to fame. AWS Lambda was launched in late 2014. There was a collective intake of breath from the industry before we realised that, if nothing else, here was something new to argue about.
First we argued about the name, “but there are still servers!” some wailed. Then some declared Serverless would usher in the demise of Ops teams (remember #noops?). Later, battle lines were drawn between budding futurists predicting an all-in on Serverless, and scornful adversaries convinced that a Kubernetes cluster was all you’d ever need.
Perversely, those very people whose jobs were purportedly threatened — Ops teams — have been some of the first to embrace Serverless, warts ’n’ all. For warts there most definitely still are: the cold-start times remain a barrier to adoption for many; and monitoring, along with the build and deploy tooling, is somewhat clunky and ill-defined.
Why Ops Teams?
Every Ops person will know an example of that server, with an uptime of several years, crontab-of-several-pages, documentation-of-none, that nobody dare reboot. In the old days it may have been tucked under somebody’s desk, in the cloud era it may be a woefully undersized instance that has fortuitously avoided termination all these years.
With Serverless, we can replace those hack-job cron-jobs with scheduled, documented and tested Lambda functions! Many of the tasks important to us are not latency-sensitive, so we can handle a little cold start-up time.
Better than this, we can even begin to extend the functionality of our cloud providers! Instead of judiciously choosing who to appease, the providers (AWS in particular) have cleverly handed us some tools and pointed at the shed. Do it yerself, they are practically saying. We, in turn, are becoming adept at welding together the various conduits on offer.
The factor that makes this possible is the ever-growing cast of AWS services that can be used to trigger a Lambda function. We’ve long had S3, SNS and API Gateway triggers; with Kinesis, DynamoDB, SQS and more added along the way. These enable Dev teams to plumb together applications to build impressive, cost-effective, high-performing systems.
What makes this technology more attractive to Ops teams is the ability to hook into Cloudwatch Events. Several AWS services emit Cloudwatch Events as they go about their daily lives. For instance, a Cloudwatch Event is sent as an EC2 instance enters different stages in its lifecycle. In the old world, we’d need an SNS topic to listen for those transitions. But now, a Cloudwatch Event is broadcast like the proverbial falling tree in the forest, and it’s down to us if we want to listen.
Two of the services that emit Cloudwatch Events are particularly useful to us.
The first is AWS Config. Config can ‘watch’ certain components (for instance security groups) and run a customisable Rule against them when they change (for instance, to check that a security group does not have port 22 open to the world). If anything is amiss, a Cloudwatch Event is pinged into the ether. This is a boon for security monitoring.
The second is CloudTrail. CloudTrail monitors all actions made against the AWS API, and can be configured to send a Cloudwatch Event when a specific action is made, for example when a new Cloudwatch Logs Log Group is created. What this means is that you can trigger a Lambda function off of the back of any action performed within AWS. Powerful stuff indeed.
Serverless at Spaceape
With that in mind, here is a selection of just some of the weird and wonderful use cases that the DevOps team here at Space Ape have found for Lambda functions within AWS:
Need to punch a hole in your firewall to access resources in your VPC? No problem, just use a Cognito-protected API Gateway endpoint with a Lambda function to proxy requests.
Problem: you need to perform some procedure before giving Autoscaling the nod that it can indeed terminate your EC2 instance (an example might be ensuring that a job queue has been emptied).
Solution: Subscribe a Lambda function to an SNS topic receiving notifications from your ASG. Have it trigger your shut-down process, perhaps via SSM Run Command. Go to pub.
S3 Object Replication
Yes, S3 has built-in support for replication. But it only allows for a single replication destination, what if you need to replicate the same data to multiple regions? Why, use a Lambda of course. With the S3 bucket as an event source.
Incidentally, why might you even want to do this? Well, if you are using Cloudformation to deploy your Serverless applications (as both SAM and the Serverless framework do), you need a copy of your code sitting on S3 in each region you wish to deploy to…
Wouldn’t it be nice if you could corral a global flash-mob of thousands of minions and focus them, tractor-beam like, on an endpoint you’d like to test? You can!
Granted, this one is a little more involved. We have built an elaborate system to this end (blog post incoming). The fast and transient nature of Lambda functions means we are able to easily incorporate load-tests into our CI pipeline.
Tidy Up Terminated EC2 Instances
We have a handful of tasks we like to do when an EC2 instance is terminated (like removing a Route53 entry, for example). By hooking a Lambda function up to the Cloudwatch Event Bus we can ensure that this gets done.
We’ve all been there: you need to pull some data from an ill-conceived 3rd-party API, convert to metrics as understood by your metrics system, then push to said system. Do you write a cron-job? Nope. A daemon? Definitely not. Just have a scheduled Lambda function do it for you!
Who knew that Lambda functions would be such an ally in the fight against cyber-crime? Increasingly we are employing them to assist us in the securing of our growing estate: as a snitch that raises the alarm every time someone logs into an AWS account as Root. Or as a fence for custom AWS Config rules gathered over many accounts, or indeed for the custom AWS Config rules themselves. Their uses are legion.
A corollary of running more and more Lambda functions is that more and more you find yourself embroiled in the murky world of Cloudwatch Logs.
To use the Cloudwatch Logs console is to wonder immediately how you can avoid using it ever again. The solution is — clearly — more Lambdas! These notice when a new Log Group is created and subscribe it to a separate function that in turn forwards the logs to an AWS Elasticsearch cluster.
Hopefully this article has highlighted some of the many ways that Serverless can help Ops teams build stable, secure infrastructure. If you fancy working in such an Ops team, we’re hiring!