Microservices: Cloud Reviews & Benchmarks

Published in

Pocket Gems Tech Blog

10 min readMar 25, 2020

by David Underhill

A decade ago we built our first backend services on Google App Engine (GAE). Google’s cloud and our products both grew and matured tremendously over the years and we still (happily) serve the vast majority of our cloud workload from GAE’s Python 2 runtime. However, Python 2 was recently sunset, and Google has greatly expanded and improved their cloud offerings (instead of GAE being their only cloud product, it’s just one of many!). This led us to explore how to deliver future backend services.

This blog post shares our learnings. First, is an overview of our decision-making process. Later, you’ll read about our quantitative and qualitative findings on the most viable cloud options (for our business).

How Do We Pick a Cloud Platform for our Microservices?

There is no one “best” answer; the answer depends on your priorities. Our priorities are developer efficiency, performance and cloud costs (generally in that order). Let’s dive a little deeper into each of these.

#1 Developer Efficiency

Developer efficiency is our paramount priority. Engineering talent is difficult to find and expensive to retain. Inefficient developers tend to be unhappy developers who’d rather work elsewhere. Here’s an abbreviated wish list for our future stack:

Understandability. When a deployment unit is too big, it’s impossible to understand all of the code. Bundling code and data models into independent modules makes it easier to reason about changes and understand how a service works. This is crucial for maintainability. It also helps us scale the engineering team by reducing how much expertise is required to iterate on any one service or feature.
Debug Efficiency. The faster we see the impact of a change we make, the better. Relevant unit tests should automatically run as we code and quickly provide a useful stack trace if there was a problem. We should be able to set breakpoints and immediately step into our code in an environment that is an accurate mirror of the production environment.
Automation means more time to focus on the challenging, interesting work we can’t automate. This means it needs to be easy to write comprehensive tests, automatically enforce style guidelines, quickly sync changes to the cloud (test environment), etc.
Minimal DevOps so the engineering team can focus on the unique needs of our customers, not mundane details like provisioning servers/scaling, OS updates, etc. If performance demands it, it’s nice to be able to get into the weeds, but the weeds should be far removed from our typical, everyday development experience.

#2 Performance

Latency, in particular, often has an outsize impact on our customers’ experience. This is less about how fast it can be, and more about how fast it can consistently be (e.g., 99.99th percentile).

#3 Cost

Engineering time is often more precious (and costly) than cloud costs. Our teams should be able to experiment and prototype features without worrying about costs. As they mature we should have a clear path to make the necessary optimizations to cost-effectively serve larger audiences.

Benchmarking Cloud Platforms

Let’s discuss how to translate our priorities into a plan. There are a wide variety of serverless offerings, each with tradeoffs. To accurately evaluate them, we need to measure them using workloads that are representative of our real-world use cases.

Cloud Platform Options

Our desire for minimal DevOps nudges us towards serverless solutions. Ten years ago, GAE was the only real option in this space, but today there are a lot more options worth considering:

This list omits options that didn’t jive with our priorities, such as:

Azure has a lot to recommend it, but in the interest of time we limited our experiments to the biggest cloud provider (AWS) and our current primary cloud provider (Google)
AWS Lambda and Google Functions limit each instance to only processing 1 request at a time, which can be very costly for I/O bound workloads
C++ isn’t the ideal language choice for most of our work since rarely do we need the level of performance it provides (at the cost of developer productivity versus a higher-level language)
Kubernetes is popular, but it would encumber us with far too much devops responsibility to offset its advantages

Measuring Cloud Platforms (Quantitative Results)

The list of options is still pretty long; it has hundreds (!) of combinations that can’t be distinguished without putting them through the paces on workloads that matter to our business. To that end, we designed roughly a dozen tests that are representative of many of our important workloads. We then implemented those tests for a large variety of serverless, language and framework options and created an automated benchmarking process which sets itself up with new cloud accounts and collects data in a repeatable, scientific way. You can check out the full source code here.

So what exactly are we measuring?

Performance — how quickly and reliably do our services respond to requests?
Cost (per performance) — how much does it cost to do some quantity of work?
(some) Developer Efficiency — how long does it take to deploy a service There are many subjective factors here too, such as how well do the building blocks offered by the platform, framework, etc. address our use cases?

Deployment Time

Deployment time is the easiest to share because the results are compact; they depend almost solely on platform and runtime (not lower-level details like runtime configuration which explode the number of options).

A shorter deployment time is better, of course. The original GAE v1 which debuted a decade ago performed extremely well, only slightly behind than the frontrunner (Cloud Run on Anthos). AWS Fargate service updates (not included in the table because I didn’t automate the process of monitoring how long its updates took, sorry) were by far the longest, typically taking in the neighborhood of several minutes (so slow!) via a small CloudFormation update.

Cost: Pricing

The first step of analyzing cost is to understand the baseline costs. Typically you choose a machine type and are billed based on its RAM and CPU (primarily affected by how many cores you get, but sometimes by their chipset generation or speed too). Each cloud provider also typically offers discounted rates based on usage or contracts guaranteeing minimum usage. Looking at this helps us see how options compare at a high-level, and are necessary for computing how much a given unit of performance costs (which is ultimately what we care about… a configuration with a more expensive hourly cost might be able to do enough additional work versus a cheaper configuration that it’s actually the most cost-efficient choice). Here’s the baseline, published costs for configurations we tested:

Lower hourly costs (per core) are better. Google’s Cloud Run on Anthos configurations (n.*) top the charts with the lowest costs per vCPU (aka hyperthread); Fargate’s cost was close — particularly if you’re willing to spring for a long-term contract. Google’s App Engine instance classes (F1, etc.) and their Managed Cloud Run costs come in far pricier.

Typically, the more devops burden we take on, the better price we’ll get. But some highly highly managed serverless offerings like Fargate have strong list prices with very acceptable overhead. But this isn’t the full picture — what really matters is cost per unit of performance on our use cases. Which brings us to…

Cost: Performance per $

tl;dr — Fargate + Node 12 + (non-clustered) Fastify was able to serve the most requests per $ on our workloads

There’s way too much data to show it all; we tested hundreds of configurations and implementations! But we mostly only care about the winners anyway. You can read more about our tests here. There are some pretty telling differences here:

Fargate processed more requests per unit cost than even low-cost, high-devops overhead offerings like Google’s Cloud Run on Anthos (which placed second on our benchmarks). CR on Anthos was impacted by k8s overhead (which would be less impactful on larger machine types), among other factors. Fargate also provided the highest performance on every individual test save two. Managed Cloud Run was 4x more expensive (that is, for a given amount of money Managed CR processed only one-fourth of the number of requests as the winner, Fargate). GAE v2 was 5x, and GAE v1 was a disappointing 25x (despite our decade of deep experience with the platform, that’s all we could get from it). Latencies (both median and 99.99th percentile) tended to vary roughly in-line with requests per second for these workloads) so we don’t show them here.

Both JavaScript and Python offered good performances (with the right configurations and frameworks). On the JavaScript side, the Fastify framework edged out Express but both were solid. On the Python side, gunicorn + gevent + falcon outperformed the competition, though uwsgi brought in solid results too. Surprisingly, pypy significantly hurt our results; so did alternative workers like gthread or uvicorn. Alternative frameworks like fastapi or flask fell short too.

Our tests were run against a single node (no horizontal scaling). In the future, we’d like to benchmark how quickly different services can spin up new machines as demand grows (and vice versa).

Cloud Platforms Product Fit (Qualitative Results)

There are also impactful architectural differences between offerings. Database update strategies, concurrency strategies, and so forth all have a significant impact on developer efficiency, performance and cost. We qualitatively assessed these dimensions while creating the benchmarks, and then double-checked them in several production pilot projects later on. The biggest takeaways (for our workloads):

Database: Conditional updates significantly outperform optimistic locking (for our workloads)

AWS DynamoDB offers conditional updates
Google Datastore uses optimistic locking

Caching: Redis + Lua provides invaluable building blocks vs. a simple key-value store like memcache

When simple key-value caching is all you need, memcache is the clear winner
Redis provides rich data structures like a sorted set
Lua gives Redis an efficient means to implement atomic operations unique to your application
Both provide a simpler cache service (via memcached which is easy to scale, minimal devops, etc.)
AWS provides a superior Redis offering (critically, with Lua scripting) which is best-suited for our workloads.

Concurrency: JavaScript’s async/await paradigm is easier to use and outperforms Python 3’s asyncio and greenlets (as well as Google’s ndb library’s tasklets that we use extensively on GAE v1)

Platform Setup: Some platforms were much easier to set up than others! From easiest to hardest:

Google App Engine v1 remains the easiest platform to use. Its batteries-included nature is empowering and easy to use, and is a big reason we happily used it for so many years.
Other managed Google services like Managed Cloud Run or GAE v2 were a hassle to setup (e.g., memcache) but at least horizontal scaling was built-in (nearly zero configuration needed) and ongoing devops work is minimal.
Fargate requires a great deal of configuration to set up well (autoscaling, VPC routing and security, logging, etc.) but at least its ongoing devops work appears low.
Kubernetes (e.g., Cloud Run on Anthos) was a nightmare to set up and manage (for a rookie); it was a relief that our workloads ended up doing slightly better on a platform which doesn’t expose us to k8s (but we wouldn’t have chosen an unmanaged k8s platform even if it had the best cost performance, given the difficulties it gave us throughout our testing… Cloud Run felt like a beta product that wasn’t ready for prime time, at least with our workloads and [lack of] expertise).

Cloud Platforms Developer Fit (Qualitative Results)

Many of the product fit highlights also impact developer efficiency (it helps to have the right tools at your disposal). Many of our wants for developer efficiency can be addressed by using containers. Or more precisely, they’re a good starting point and can help keep deployment units small and isolated as well as facilitate repeatable and accurate local testing. Debug efficiency is aided by Cloud Platforms which provide purpose-built containers for localhost testing (e.g., amazon/dynamodb-local). In terms of minimizing devops, GAE v1 is still the gold standard; other options required more setup work and some required a lot more ongoing work as well (see Platform Setup above). In the end, most modern platforms had the right building blocks to achieve our developer productivity goals. Some required a bit more elbow grease to initially set up, but ongoing barriers to productivity were similarly low. It turns out the party most responsible for developer productivity is, perhaps unsurprisingly, ourselves!

Conclusion

We’re building our next generation services on Fargate, using Node 12 and Fastify. We didn’t expect to end up here; we’ve been on Google App Engine for long enough that we expected their new offerings would likely be at least as good as others on the market (for our workloads). But we were pleasantly surprised how well-suited our workloads were for AWS. It’ll be interesting to see how these two competitors evolve in the future; with containerized microservices, it’s a bit easier to try different platforms (though database lock-in remains a prominent source of friction since each NoSQL database is very idiosyncratic and not well-suited to a common abstraction layer).

If you are interested in working on a high performing team, learn more here!