Why I prefer Serverless

I have worked and built a significant amount using K8s. I was lucky to have worked with some of the best k8s technologists you can find anywhere during my time at Intuit. Then I moved to a startup. As we do, we take the stories and experiences with us when we move, and replay stories from the same book.

This is not hate on k8s — But it is not always the right answer. Just like Serverless is not always the right answer. In decisions like these, context is king. An important part of taking on a new job is figuring out what should be unlearnt. And unlearning k8s was one of the things that I had to do. What works at large scales does not necessarily reproduce well at smaller scales.

So what was the problem? At larger companies there is typically a ‘platform’ team that is responsible for developer productivity — building tooling that enables other engineers to move fast. At the smaller scale of a startup, there is no such team. we can move faster by thinking of our cloud vendor — AWS - as our platform. So the question is: what will allow a team to move the fastest given this constraint?

Let’s start with this picture from the Cloud Native Computing Foundation for the k8s landscape:

(Note: The image got truncated because it is too large. https://landscape.cncf.io/)

I have 3 letters for this diagram. W,T, F. Who has time to draw up these diagrams, and what decision tree does one have to follow in making choices from this picture? This is nuts. Larger companies can afford the time and resources to have lots of meetings to discuss each box. I just need to deliver value.

k8s was already in use when I joined the startup. The complexity of this diagram showed up in small and large ways in our k8s clusters. Deployment of our k8s clusters turned out to be an event, with 4–5 engineers starting at pixels. (to be fair, some of it was reflective of the specific implementation — but again, the diagram above does not help). Troubleshooting was problematic — we had only a couple of engineers who could really dive into details and understand what was going on when a cluster was sick. Above all, there was little interest in the team-or from me — in spending the time to become k8s experts. We are not in the container management business. We are in the deliver value to customers via products business.

We had a choice: Go hire some engineers with k8s expertise. Or dump it, and change course to Serverless. Even if we chose to hire k8s experts, we would still deploy some software using Serverless. If that is the case, why bother with k8s at all?

It took some courage, but we decide to dump k8s and move all software to Lambda. (We always use our Architecture decisioning process to make such decisions of importance) We built out a serverless paved road (soon to be open sourced) with tooling that made it easy for engineers to deploy their first serverless feature in less than half a day. The developer NPS improved significantly for teams that adopted it (there were some teams that could not because monolith). Troubleshooting improved dramatically, aided by observability implementation using DataDog. And most importantly, we moved from releases being a milestone to a complete non-event — When something was ready to be released, it was released. One incident stood out to me: We had a lambda service which had a bug in production. The fix was released in about 10 minutes. Engineers on serverless release on demand.

I do not claim that the same cannot be done with k8s. It is just less efficient for a smaller team. The more that can be outsourced to Amazon, the more efficient we are. Embracing the constraints imposed by operating at smaller scales opens new doors. In our case, standing on the shoulders of the AWS behemoth opened new doors to accelerating velocity.

One of the core architecture principles at Inflection.com is to move from IaaS to Managed Services (PaaS). Our data pipeline, compute and storage all leverage PaaS as much as possible. It gives us a leg up on competition which is slow to adopt PaaS.

Reasons given for avoiding serverless

‘It causes vendor lock in’

This is the easiest for me to dismiss. I spend close to zero time thinking about this. AWS is our platform. Worrying about vendor lock-in is a waste of time. Are you going to build facades and avoid using every managed service on your cloud platform? Good Luck. All you will do is slow down your team. I choose to move fast by leveraging our cloud vendor, and do not lose a wink of sleep worrying about lock in. I am happy paying AWS to manage all the infrastructure, so we can focus on our domain logic.

‘Serverless is too expensive’

Definitely something in this. The cost for lambda is based on number of invocations and the duration for every GB-Second. There is definitely a point wherein if you exceed the number of invocations + Time*GB-Seconds, you will be better off renting a CPU and not pay per invocation. Trek10 has a good writeup on the pros and cons of Lambda vs Fargate, and generally shows better pricing for Fargate as requests/sec increase. But the picture is actually much more complicated. You have to take the cost of the API gateway into account for Lambda. Adding capacity in Fargate is a step function, not a smooth curve like Lambda. And then there is provisioned capacity in Lambda — if you are using this, it needs to be factored in. You do not pay for idle applications in Lambda unlike Fargate — how do you factor that in?

I like to think about these discussion in the Value to Cost Ratio terms. How can the team move the fastest and deliver value to customers, and how do you factor that into technology decisions as opposed to just worrying about the bottom line? IMO, Serverless delivers amazing Value to Cost Ratio in a way k8s did not for us. If you can make a compelling argument for k8s in these terms, by all means, use k8s.

“You will not be able to build all your software on Serverless”

This one might be true for some teams. This is driven by two “limitations” (as of Sept 2021)

(1) Max memory allocation in Lambda is 10GB — if you need more, you are out of luck and

(2) A 15 minute timeout. If you have long running jobs, Lambda is not a good solution for you.

I have not hit either limit thus far, but may in in the future. And that is ok. If I can do 99% of my work the most efficient way I know how, I am ok doing the other 1% in a different, less optimized way (by less optimized I mean not spending much time on customization/automation for that pipeline)

Fat and Thin Lambdas

Once we decided to try out Serverless, a debate raged about ‘Thin’ vs ‘Fat’ Lambdas. Should we consider Lambdas as a ‘Function as a service’ or as a decomposition unit for microservices? There is no right or wrong answer here. Jeff, my colleague, prepared this analysis after research and consultations with our friends at AWS:

Analysis of Fat vs thin Lambdas ( Inflection.com 2021)

We use both modes: For decomposition, the fat lambda pattern is used. For new microservices, a more FaaS model is generally adopted. (No, we do not call it a FaaS-ist model)

The Toolchain

  • DataDog (Observability, performance monitoring, tracing, logging)
  • Sedai.io — gives us automated tuning of serverless functions (Note: I am an advisor to Sedai)
  • Serverless.com & Github Actions for deployments
  • Terraform for IaC
  • .Net/Python(we are moving to .Net Core.. excited about running on Graviton. I will cover this topic separately)
  • AWS managed services — EventBridge, RDS, and a plethora of other managed services

Table Of Contents

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store