A Cloud Guru
Published in

A Cloud Guru

How we migrated our startup to serverless

The quirks and hurdles of a startup’s serverless journey

For a startup to survive in today’s environment, it’s critical to quickly test and deploy updated versions of your product and deliver them into the hands of new customers. A development team must be focused on delivering business features — not configuring and maintaining undifferentiated infrastructure.

At the start of my companies journey, I thought our approach was great.

We initially used Terraform to deploy to AWS using EC2 micro instances for each of our microservices. We had a staging and production environment — and with Docker+Docker Compose every developer could run a full environment locally on their machine.

The problem started once we reach around twelve (12) microservices.

Since we wanted to replicate production in our staging environment, we had an EC2 instance for each microservice. But these virtual machines were just sitting around doing absolutely nothing — and our environment costs began to get really high.

We discussed options for implementing feature environments so that we could do full QA testing on feature branches — but soon realized that replicating our production EC2 instances configuration would not be feasible.

We needed to pivot.

Ditching instances in favor of functions

The team decided to ditch all of our EC2 containers and instances in favor of using AWS Lambda within a serverless architecture. To help us orchestrate all of our AWS Lambda functions, we selected the Serverless Framework.

Our stack changed from managing our own messaging system to using AWS serverless

It wasn’t an easy switch going from servers to serverless — although porting our microservices was made much easier with the extensive resources and help from the serverless community.

Why AWS Lambda?

There are other serverless platforms out there by Google and Azure, so why did we pick AWS Lambda?

  1. The serverless offerings from AWS extends well beyond functions
  2. The extensive AWS serverless ecosystem and engaged community
  3. We are already using RDS, SNS, Elastic, and API Gateway, and have plans to use SQS and EC2 for other purposes
  4. We had built our pipelines and tooling to deploy and monitor AWS — and didn’t see the value to retool using Google Cloud Functions

Why the Serverless Framework?

While investing a framework to write and deploy our AWS Lambda functions, we evaluated features that would allow our team to focus on writing and delivering code.

We discovered that the ecosystem offers a range of tools from barebones (e.g. AWS Lambda Toolkit) to fully features frameworks (e.g. Apex).

We chose the Serverless Framework for the following reason:

  1. It allowed our developers to focus on creating functions
  2. The configuration is very straightforward
  3. Deployments are insanely simple using AWS CloudFormation

How we migrated our microservices

When we originally built out our microservices, we used Express as our router and middleware manager. We opted to use JWTs instead of sessions — which removed Express from the core code.

With the exception of our real time services, all of our other services were stateless — which made porting our code to AWS Lambda using Serverless extremely straight-forward.

To bootstrap our migration, we simply wrapped all of our Express apps using serverless-http — which speed up porting our existing stateless code to AWS Lambda. We also replaced all of our RabbitMQ functionality with AWS SNS when possible.

The team removed all RPC calls in favor of more asynchronous work flows. While we could have invoked other Lambda functions from within a Lambda function, you end up paying double for execution since both Lambda functions are running. Instead, we replaced all RPC calls with two SNS calls — one to request work to be done, and another to let services know that the work has been completed.

The last step was understanding how to return a response back to the client as quickly as possible. If we knew additional asynchronous work had to be done, we would return a 200 status to the client while continuing to do work on the server.

With AWS Lambda, sending a response back to the client will halt execution of your Lambda. We simply had to go through our code and look for where we exited early — and change the code to exit after we had scheduled all the work that needed to be completed.

Using AWS Internet of Things for real-time services

One of the largest services is our real-time service — which is in charge of pushing real-time updates to connected clients. We used SocketIO and had planned for it to be clustered. This meant that any incoming message that needed to be emitted to other connected clients first went through RabbitMQ and then published to all instances in the cluster.

AWS Lambda doesn’t let you have long running processes, so we needed to either create and maintain EC2 instances for our real-time services — or find an alternative.

AWS offers an IoT service for subscribing to topics and publishing messages to these topics using the MQTT protocol. This service also allows you connect using websockets. We switched from using SocketIO to MQTT on our client application — where we pay per message sent instead of paying for running servers.

No more RPC calls

Before we transitioned to AWS Lambda and serverless, our event system was built on RabbitMQ. It’s an amazing message broker that we loved using. We used almost every feature RabbitMQ had to offer — worker queues, pub/sub, and RPC calls.

When we switched to AWS Lambda, we also made the decision to ditch our RabbitMQ cluster and instead use Amazon’s Simple Queue Service and Simple Notification Service. Porting our worker queues and pub/sub logic was fairly straightfoward, but the remote procedure calls would be a little more troublesome.

The problem is that you you pay for execution time with AWS Lambda, If you have one Lambda invoke another and wait for the response, you end up paying for the execution time of both Lambdas – the one that is actually doing the work and the idling function.

We opted to transform our RPC calls into truly asynchronous functions by using an SNS calls.

  • Lambda A would make an SNS call that’s subscribed by Lambda B
  • Lambda B would do some work — such as perform a look up, aggregate some data, or pull data from a 3rd party API — and then it would make an SNS call to Lambda C
  • Lambda C would have normally been executed as the callback from an RPC call — but with the new architecture, we split the RPC call into 3 parts: the invoker, the worker, and then the callback

Minor hiccups along the way

There were some growing pains we encountered as we moved everything over to AWS Lambda. It all boils down to three weird facts about Lambda that seem contradictory at first — until you really start diving deep into Lambda.

  1. When the callback is called (explicitly or implicitly), AWS Lambda continues the Lambda function invocation until the Node.js event loop is empty.
  2. If you have asynchronous code running, calling the callback may stop executing your Lambda function.
  3. If there were asynchronous functions waiting to be called, the next time your function is called, they may (or may not) be ran.

We ran into the first fact when using Sequelize. We use PostgreSQL with AWS RDS and to speed up development, and we use Sequelize as our ORM.

We found that Sequelize would keep our Lambda’s running until they hit their timeout and died. Unless you want to go through your code and add close statements to Sequelize for every function — a feat made tougher by the fact that JavaScript doesn’t have a finally statement — the solution is to force AWS to halt execution of your Lambda function when callback is called.

We wrote a little higher order function to help us with that:

module.exports = func => (event, context, callback) => {
context.callbackWaitsForEmptyEventLoop = false;
func(event, context, callback);
};

The second and third facts came up as some seriously weird bugs in our app where we would see delayed real-time events and SNS events running twice. Since the functions would timeout, SNS would replay the event since it thought it had failed.

Now the real journey begins

Moving our startup to simple stateless serverless functions has a few hurdles, and AWS Lambda has a few quirks that requires some trial and error. But at the the end the journey, we managed to eliminate all of our EC2 instances and we’re now running all our code using AWS Lambda.

We are still getting used to how AWS Lambda works, but the serverless framework substantially lowered our barrier to entry. Splitting your code into domains and writing new functions is simple and straightforward.

With first-class support for events and scheduled functions, serverless allows us to to focus on delivery business value — while AWS does the undifferentiated heavy lifting.

The team is really happy with our move to serverless — albeit with a few growing pains. I’d be interested in learning more about your experience with serverless. Please drop a comment below or connect with me on Twitter.

Ivan Montiel is the founder and CEO of Clarity Hub — a company that integrates with Intercom to give customer success teams real-time suggestions.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store