Going Serverless (on AWS)

Payam Moghaddam
Jul 5 · 16 min read
What does it take to go Serverless?

In the beginning of 2020, Galvanize decided to go all in on Serverless with AWS. In case you are curious, I explained our reasons behind it in the post AWS as a Framework. Now in 2021, the question is, how has our journey been, and what can we share so you prepare for your journey to Serverless too?

Without a doubt, our transition to Serverless has opened up a ton of product opportunities and has increased our rate of innovation. It’s much easier to think of innovative solutions when you are not constrained to an all-inclusive application framework that assumes it is your entire application, and instead are able to design with the entire Serverless suite of capabilities AWS provides you. With this shift in mindset, we’ve been able to focus more on what we should do, rather than what we can do. All the while being able to focus more on our product and write less code. After all, customers don’t care about your infrastructure nor how much code you write, they care about the value they get; so it’s nice to focus on what they care about.

To tap into the benefit of Serverless though, you do need to operate differently; there is a difference in how you develop Serverlessly vs. not, especially if you are an enterprise or a highly compliant company like Galvanize. Unfortunately, blog posts and conferences don’t sufficiently cover these operational differences and it causes more friction than it should when adopting Serverless. This blog post is meant to address this friction. It will give you a heads up of the challenges ahead, when to worry about it, and how to overcome it. Hopefully this reduces the friction for you and your team to successfully adopt Serverless.

Start of a Journey

There is a funny aspect to listening to talks, or learning from conferences. Everything sounds really simple: add a Lambda here, integrate it with CloudFormation there, sprinkle some SAM on top, and 💥, you’re done! That’s the impression you get. However, in a mature organization, there are lots of important details these presentations don’t cover. While not an exhaustive list, I recommend thinking through the following before starting on your Serverless journey:

  1. Code Organization — when you start having 10 or 100 Lambdas, how do you organize the code? You definitely don’t want one repository per Lambda, so do you need to use a monorepo? If so, how do you organize that monorepo? Furthermore, how can you then deploy the pieces of this monorepo independently?
  2. Package Management — you may use development libraries to test or build your Lambda, but then how do you package your application with only production dependencies for AWS Lambda? Especially with Node.js, you don’t want development packages such as TypeScript or Jest, you only want production packages. This challenge is compounded once you start using a monorepo pattern, and want to share packages between Lambdas too.
  3. Infrastructure Definition— when you develop monolith applications, or microservices to run on e.g. Kubernetes, it tends to be clear what your “Infrastructure” teams manage (e.g. VPC, subnets, security groups, Kubernetes) and what your “Application” teams manage (e.g. Rails code, Node.js code, framework setup). However, once you start using “AWS as a Framework”, who is responsible for creating an SQS resource? What about an EventBridge rule? What about a DynamoDB table? You want to give teams independence, but you also don’t want to give teams “Admin” access to your AWS account. So where is the boundary now for your application teams? And how can you enforce that boundary?
  4. Modularization— once you start going Serverless, you’ll start seeing a lot of repeat patterns across teams. How Lambdas are setup, or IAM policies are configured, share significant similarities between teams, but if teams don’t have a mechanism to share infrastructure code, particularly a curated set made by the Infrastructure or Platform team, you start having a ton of duplicate code across teams. So how can you abstract and share these repetitive parts so teams don’t duplicate or reinvent the wheel?
  5. Testing — if you build a Lambda function and integrate it with an event bus, how do you test that your function or integration actually works? More generally, how do you test services that are tightly built into AWS?

These are pretty important questions that don’t get as much attention as they should! 😅

Fortunately, I welcome these challenges since I believe they push our industry forward towards more scalable and more secure practices. For example, tight integration with AWS often means a tighter use of Identity Access Management (IAM), thus encouraging smaller access perimeters. Similarly, small focused functions create smaller surface areas to exploit, as well as a smaller blast radius. Lastly, monorepositories built around domain contexts allows you to easily grow your logic in small, independently deployable chunks. These challenges are worthwhile tackling and are a great learning opportunity for your team members too!

Let’s now discuss each of these particular challenges in more detail, as well as how we’ve overcome them in Galvanize, so hopefully you don’t have to figure it all out yourself.

Code Organization

The trickiest part of going Serverless is embracing the idea that your Cloud provider (e.g. AWS) is now your framework. Until you do, you’ll continue to setup artificial infrastructure boundaries that’ll prohibit you to actually build in a Serverless fashion. What you want is to be able to go to your repository, and see everything that is required to run your application (for a particular bounded context), including the infrastructure code. At Galvanize we achieve this by having Terraform and application code in the same repository, as well as using a monorepo pattern to easily modularize our architecture and infrastructure in order to independently develop and deploy them.

In the example above, by simply looking at the services folder, you’ll see each independently deployable chunk of infrastructure and/or application code. Furthermore, you don’t care how much is infrastructure vs. application logic; you just need to know how to deploy that service. This abstraction blurs the line between AWS and your business logic, thus embracing your Cloud provider being your framework. For example, the click_stream service may have no logic aside from Terraform (you can achieve this by tying together API Gateway and DynamoDB without any Lambda), or it may be multiple Lambdas. It doesn’t matter; the team can decide! And as you build more services, you can extract reusable logic into the packages folder (e.g. core_domain) which multiple services can now include as a dependency.

With this organization, it then becomes easy to deploy:

# Inside "services/click_stream"
terraform workspace select playground
terraform apply -var-file=environments/playground.tfvars

At Galvanize, we have an internal tool called Sabretooth to simplify this further. All we do is:

# "apply" the "click_stream" service to "playground"
sbr tf:apply click_stream playground

Sabretooth will then account for Terraform workspaces, S3 remote storage, aws-vault integration, tfsec scanning, plus company specific validations. (Hopefully we’ll open-source this soon. 🤞)

One thing you’ll notice is the use of Terraform. Technically, there are many different tools to deploy infrastructure (e.g. CDK, CloudFormation, Serverless Framework, Pulumi), some very lean so you can focus more on your business logic. This space is still young, so there is no dominant tool yet (thus our need to create Sabretooth). For us, we’ve sided with Terraform for various business reasons. We are a high security, high compliance company, so the need to tightly control the infrastructure (e.g. tight IAM policies, network controls) is a core requirement that makes it difficult to do with tools that may abstract or restrict raw infrastructure access. After all, AWS is our framework.

But it will not be Terraform, or CDK, or Serverless framework that’ll get in your way; it’ll be how you structure your code and how you think about infrastructure. Make sure to not underestimate code organization, since there is no established best practice yet, and it can easily spiral out of control if you don’t establish a convention early on. And since how you will organize your code will be the very first problem you need to address, I believe irrespective of your company size, you need to make sure you figure this out first.

Package Management & Deployment

With the code organization above, it becomes pretty easy to deploy small bits of infrastructure code and Lambda functions. If you could fit your whole application into an index.js, you’d be set! Unfortunately, this is probably not a reality for many scenarios. When building a complex and mature application, you’ll never have just a single index.js and you’ll certainly rely on 3rd-party packages. This is where it gets complicated.

Here is a practical scenario for Node.js (although languages such as Golang may not face this). Imagine a non-trivial service where you require the package Moment.js (moment) and you want to test many scenarios using a test runner such as Jest (jest). Your business logic in this case is pretty simple, but packaging this for Lambda is not. You want to package your logic with moment, but not with jest; however, especially in a monorepo, there is no native npm nor yarn capability that can do this!

With Yarn (1.x) workspaces’ monorepo structure, yarn install will install every service’s dependencies into node_modules. You can’t isolate it to a single service. So you may take the short-cut and decide to simply have each service have its own package.json and essentially not use Yarn’s monorepo capability. However, you start having different problems since now you can’t easily share logic between the services via a shared package (something Yarn workspaces does support).

Ultimately, you realize that the tooling is simply not mature enough yet. At Galvanize, we built an internal library to help with this packaging problem. Internally we type sbr hb:vendor click_stream and it packages all production Node dependencies, as well as shared packages in the repository, into dist folder. Now we can simply zip this folder with our logic to deploy our Lambda function.

We plan to open-source this library soon, but in the meanwhile, make sure you consider how you’ll package your services for deployment, especially if you are using a monorepo pattern. It’s easy for some languages (e.g. Go) and hopefully one day runtimes such as Deno will make this easier still for TypeScript development; but if you plan to use Node.js, think it through.

If you’re a start-up, you can solve for this when you get to it. For a mid-size company or bigger, you need to have a strategy so teams don’t create duplicate package management solutions.

Infrastructure Boundaries

Now imagine you’ve figured out how to organize your code and package it too to deploy. How can you now deploy your brand new service into production in a safe manner? This is actually a complex problem.

In many organizations, there is usually an “Infrastructure” or “Platform” team that supports or helps other teams get their resources into production. It works fine when deploying new services is not an every day event. However, with Serverless, it practically is! You’re regularly creating new resources in production! So the question is, how can we protect production from an incorrect change? This has two distinct challenges:

  1. Limiting Actions against Resources
  2. Limiting IAM Policies and Roles

Limiting Actions

It’s not difficult to create a role that prevents performing sensitive actions, such as destroying RDS instances, EC2 instances, etc. The trickier part is knowing what to permit and not have teams step on each others’ toes. For example, how can a team create a new SQS resource independently, but not touch another team’s SQS resource? That’s the challenge.

Fortunately there are a few patterns to address this. First, depending on your business needs, you may simply opt to accept the risk (e.g. you are a startup, or you have a small team). You can simply allow everyone to manipulate any SQS resource, and hope no one messes up. However, if you are a larger team and your security or compliance bar is much higher, you need to limit each team’s blast radius. For this, you can use Attribute-Based Access Policies in AWS. I will not elaborate on the details, but essentially, you can restrict teams to be able to only interact and create resources with their team’s tag on it. Thus, they effectively stay within their own boundary and can’t mess up any other team. Alternatively, if teams are highly independent and seldom need to interact with each others’ resources, you can also look at using AWS Organizations and creating AWS accounts per team.

This solves the first problem, but this is the easier problem. The harder problem is controlling the creation and modification of IAM policies themselves, as it can easily lead to privilege escalation. After all, you don’t want people to be able to create an IAM role and policy that they can then assume to become an Administrator! Fortunately, AWS has a way to address this as well.

Limiting IAM Policies

Imagine you need to add an SQS queue or a DynamoDB table for your Lambda function to access. This translates to an IAM policy change for your Lambda. However, how can you allow teams to update IAM policies to access such resources, without allowing teams to update their IAM policies to effectively become Administrators or do more dangerous operations (e.g. delete RDS or DynamoDB resources)?

“How can we prevent mistakes?” is the core reason why so many Infrastructure or Platform teams end up being involved to make these sensitive changes. They act as the company’s safety net to dangerous changes. However, this can quickly become a bottleneck when you are trying to use “AWS as a Framework”. So what’s the solution?

With AWS at least, there is a concept of “Permission Boundaries” for IAM. Effectively the Infrastructure team can setup permission boundaries that enforce a narrow set of IAM policies that teams can create. This boundary effectively defines the “allow list” for what teams can do day-to-day before needing the Infrastructure team’s assistance. This significantly speeds up AWS adoption as teams can add and remove resources, tagged to their teams, in production, without any Infrastructure team involvement!

These boundary enforcements apply once you are a much larger company, or if your company has a high compliance and security need. As a start-up though, you need very simple protection levels (e.g. prevent deleting the database), and it’ll be far easier for you under such conditions to go all-in on toolchains such as Serverless framework before needing to dive deeper into Terraform or CloudFormaiton. Only once you grow bigger, and it becomes both possible and easy for teams to step on each other’s toes, do you need to both establish and enforce these team boundaries.

Modularization & Conventions

Imagine your organization has adopted a Serverless strategy, and now fast forward six months to when everyone is using AWS to its fullest potential. One thing you’ll quickly notice is how much repetitive infrastructure code is emerging. Oddly enough, when it comes to application code, people are quick to realize they need to DRY (Do not Repeat Yourself) their code and create abstractions. When it comes to infrastructure though, it is not as intuitive and people struggle to uphold the same DRY principle. Different toolchains have different recommendations, but we’re a Terraform company, so let me explain how we modularize and reuse Terraform code internally (although you should be able to apply the same ideas with CloudFormation modules, etc.).

Basically, you’ll quickly realize there are typical patterns for typical scenarios, and there are reasonable configurations for such scenarios that you can bake into Terraform modules as conventions. For example, if you want to serve an SPA application, with raw Terraform, you’ll need to:

1. Create a Cloudfront distribution (with appropriate configurations)
2. Create a DNS entry for this Cloudfront distribution
3. Create an ACM certificate if you want TLS for your end-point

In reality though, all you actually want is:

module “spa” {
folder = “./spa”
domain = “dummy-spa.example.com”
}

Where ./spa are the static files you want served from dummy-spa.example.com. This is easy to understand. Asking people though to create Cloudfront, Route53 records, and an ACM certificate, is not!

It thus becomes important to create these abstractions so teams can focus more on their business problem as well as adhere to company conventions. We do exactly this in Galvanize by having a repository called highbond-terraform-modules that hosts complete modules that teams can use to quickly get started. You’ll want something similar for your teams, if you want to have consistency and reduce boilerplate code.

The need to modularize becomes more important as you grow bigger as a company. When you’re a small team, you start with no modularization, but as you notice repetition, you create modules to reduce duplication. Furthermore, conventions are not yet established early on, so you might as well simply work on building your product and achieve product market fit. Only when you are large enough, and notice duplication, or teams start to lose track of company conventions, will you then want to formalize module creation.

Testing

In many ways, testing will not be significantly different in a Serverless architecture compared to a containerized microservices world. At its core, it’s going to be important to make sure teams understand and uphold their services’ contracts, and know how to effectively mock them during tests. The main difference in your testing strategy will be to use more system-level tests since you may need a real AWS environment test more scenarios.

Depending on how tight your integration is with AWS, you may simply create fake events and pass them into your Lambda function to test locally (fast, but least realistic); or you may use localstack to minimize mocking at the cost of a more complex setup (slower, but more realistic); or you may use isolated system tests against a real AWS environment (slowest, but most realistic). In all cases, it’s important for teams to practice and build a new frame of mind for testing in this Servleress architecture. If they’re coming from a monolith world, it’ll be a huge difference. It’ll be important to emphasize the purpose of testing (to minimize risk, not eliminate) and have teams build their testing strategy based on what their application needs, rather than what they may have historically done.

This is a rather complex topic (perhaps suitable as a separate blog post), but it’s worthwhile for you and your team to practice building a testing strategy for a hypothetical all Serverless application and note how you will test it, and observe it once deployed. It will not be like a traditional monolith, nor a container-based application.

If you are just starting with Serverless, you may not give testing as much thought since there is far less application code than ever before. Furthermore, your code is more observable by default, so you can quickly spot mistakes and recover. As you grow past the early stages of Serverless though, you’ll certainly want to treat it with urgency.

A Rewarding Journey…

When you reflect on the challenges above, you’ll notice that each challenge is actually a move towards better software development.

  • Deploying only a small part of your overall application at a time? ✅
  • Packaging only what you need for your runtime in production? ✅
  • Creating tight security boundaries? ✅
  • Allowing teams to use the full potential of your platform? ✅
  • Developing in a distributed architecture, so its easier to scale the company? ✅

These are all excellent steps forward.

In fact, it’s even causing languages and runtimes to evolve to support it. GraalVM is supporting binary packaging which will help in a Serverless environment; Deno supports packaging to a single file (much like Golang) to avoid package management overhead; and CI services are supporting monorepositories more natively too! This is excellent! 🎉 It’s just a matter of time until Serverless is no longer a novelty, but rather a mainstream choice.

…with its Challenges

Having said that, Serverless is not necessarily an easy journey to take on, if you’re not prepared for it. There may be plenty of resources to guide you, and plenty of reasons for Serverless, but toolchains are still developing and many people are not yet used to thinking Serverlessly. As a result, you need to be ready to train your team and have technical leaders available to guide your teams. Serverless development, and using your cloud provider as a framework, requires stronger foundational understanding of software development. This is why at Galvanize, in order to roll out Serverless effectively, we combined it with a robust training program as outlined in Developing Talent in R&D.

If you don’t acknowledge this difference in skill set (not harder, just different), and front load your team with training, then you’ll be quick to hear frustrations of “This was so easy in Rails/.NET/Django”. If people cannot use the infrastructure to its fullest, and are shielded from the total complexity of software development which includes infrastructure, operations, and security, then for these people, it will genuinely feel like Serverless is harder. However, if people are trained, understand Serverless’s benefits, and understand the complexity reduction of the overall system, then they’ll see that it’s actually much easier and can get behind it!

Becoming Serverless

It’s a journey for your team to become Serverless. Unfortunately, what tends to trip people up are the initial structural differences they may not be ready for. And if people trip in the very early stages, it’s very likely they’ll step back into the safety of what they already know how to do. To not trip and build momentum instead, you simply need a bit of upfront planning. Hopefully our experience in Galvanize gives you the insight you need to plan, so you can successfully complete your Serverless journey. Even if you don’t walk the same path as us, you at least know what’s ahead of you.

Good luck, and may the force be with you!

Build Galvanize

A window to the product, design, and engineering teams at…