Service Discovery with ALB Advanced Request Routing

Published in

ScribbleLive Engineering

11 min readSep 20, 2019

One of the challenging parts of deploying microservices in production is managing the dependencies between them. As much as microservices are meant to be atomic and independent, eventually you may come to a point where one service needs to make a call to another. And inevitably, because no software is static, a service that other services depends on will need to change. How can we manage and verify this change in a way that minimizes the need for complex configuration hierarchies or a combinatorial explosion of environments? That’s what this post aims to explore.

Options for dependency version management

Let’s start with looking at some options for managing the version of the dependency one service is using. For simplicity, I’m going to focus on the use case of one RESTful api — we’ll call it API-A — that needs make a call to another RESTful api — we’ll call it API-B. We can assume that API-A was designed and built with API-B exposing a certain contract. Let’s say API-B has a new version that optimizes a route (let’s say /widgets) internally but does not make any breaking changes to the contract. In this case we can assume it is safe to deploy a new version of API-B without causing any negative impact to API-A. But we would want to verify that. Here are some possible options for doing so:

If you have a separate development/staging environment that contains all the same microservices as in production, then you can likely just deploy the new version of API-B to this environment, replace the previous one, and run your test suites against API-A (and API-B directly if you so choose)
We can go a step further with API-B’s change and say that even though we are only changing the internals to /widgets, we can follow an API versioning scheme where the route that changed gets a whole new code path, separate from the code that implemented the old version of /widgets. It also gets a new mode of accessing it that includes the version number, for example /v2/widgets. Of course, this also requires changing API-A’s code to point at these new routes, which in turn may require making /v2 versions of the routes that now depend on API-B’s /v2/widgets route.

For non-breaking changes, Option 1 will suffice most of the time. The more representative of production the staging environment is, the better. This means volume of data, infrastructure type (if not scale), and networking should be the same as production for the testing to be meaningful. I am not a fan of Option 2, since making API versions for non-breaking changes will lead to a lot of code duplication if you are following the spirit of not touching or even refactoring code that powered the old versions of routes.

But let’s throw a wrench into things: What if we want to be able to run both the old and new versions of API-B at the same time?

The Case for Service Discovery

At ScribbleLive, one of the problems we’ve faced is how to put features in the hands of customers before they are launched to the general public. On our front-ends, this can be achieved painlessly through feature-flagging tools like LaunchDarkly. But to ensure that we do not affect the stability of our high-traffic production back-ends, we needed a way to maintain our network of microservices running stable master-branch code while also exposing certain opt-in customers to beta-level features that were run out feature-branch versions of one or more microservices. How can we manage this in a way that doesn’t become a nightmare of configuration management? That’s where the concept of service discovery comes in.

As a pattern, service discovery simply consists of a service registry that keeps track of available services and a means for consumers to request an endpoint (typically an IP Address) for a given service. There are a variety of different options at your disposal if you would like to implement a service registry:

One of the simplest forms of a service registry is internal dns. Services like AWS Route 53 allows you to set up an internal-only TLD like mycompany.internal. Then you can simply assign well known names for your services like api-b.mycompany.internal and point it to the IP Address or load balancer of the actual service. Now API-A’s config just needs to point its calls at api-b.mycompany.internal. As long as you’ve set a low TTL on this DNS entry, you can re-point traffic to a new version of API-B without re-deploying API-A. Route 53 even allows you to make weighted traffic policies so you can use this to send a random amount of traffic to a beta version of API-B while the majority of traffic goes to the production version.
AWS ECS has its own Service Registry system that you can use. Under the hood it is using Route 53 to maintain the registry but it comes with some nice features such as container-level health-checks and multi-target routing that actually bypass the need for an AWS ALB in front of the services. Service Registry comes at its own cost but it should be considered against the cost savings of not requiring an ALB. This has minimal impact to the microservices that use it because similar to (1) they just need to point to the right DNS entry that AWS Service Discovery sets up for each service.
You can build your own service registry using a fast in-memory database like redis or memcached. The keys could be your service names and the values can be the current IP address representing the endpoint for that service. Or the values can be a more complex structure, exposing master and feature-branch endpoints for a service. This also requires some extra development on the microservices that use it — they will need to request and endpoint from the service registry before making a call to that service. This would be best implemented as a library you could distribute across your services. You would also need to implement a process to seed and modify this database.
Open source reverse proxies like Envoy and Traefik include their own service registries. If you’re comfortable running and configuring an unmanaged service that sits in front of your microservice traffic, these are an option.

How well do each of these support the use-case of dynamically routing traffic explicitly (i.e. non-randomly) to a feature-branch version of microservice?

The two DNS-based approaches don’t offer a lot of out-of-the-box support for this. You basically have to set up a separate DNS entry or Service Discovery namespace such as api-b.feature-branch.mycompany.internal to represent the feature-branch endpoint for the service. This requires making changes at the code-level to point to right DNS entry based on the circumstances.
The BYO service registry database is only limited by how much work you want to put into building and maintaining it. It’s easy to imagine a system where the value stored for each service key contains all the endpoints for every version of the service at that time. Similarly, the reverse proxy solutions can be set up with fairly complex rules to route requests to any number of different versions of a service.

None of these approaches were ideal for ScribbleLive to roll out a feature-branch-ready service discovery system to our existing network of microservices. Instead, with us already having AWS ALBs in front of each service, we were able to take advantage of ALB Advanced Request Routing to achieve our needs.

ALB Advanced What?

ALB Advanced Requesting Routing is a fairly recent feature to ALB (https://aws.amazon.com/blogs/aws/new-advanced-request-routing-for-aws-application-load-balancers/ — March 2019) that allows you to set up much more sophisticated rules than the prior rules system which was limited to url patterns. These are wildly flexible, allowing not only to route requests based on query string and headers to different target groups, but also to redirect requests to entirely different host names. For our purposes of service discovery we’ll focus on the former feature: the ability to route based on headers.

A service discovery scheme using advanced request routing requires the following parts:

Having an ALB in front of your microservices. You can either have one ALB in front of each microservice, or one ALB shared amongst many microservices. Either approach works in the context of this service discovery scheme, but if you choose the latter, the rules will be more complex and you should be aware of limits on the number of targets per ALB.
Having a master target group and zero or more feature-branch target groups for each service. Having a dedicated target group per branch is necessary for this scheme to function.
Setting up listener rules that by default, route requests for each service to master target groups.
Setting up higher priority listener rules that route requests to feature-branch target groups if a certain feature-branch header has a certain value.
Instrumenting your code so that feature-branch headers passed into each microservice call are relayed in each outgoing call to other microservices.

The following diagram shows how this will look once it’s all together:

Setting up the initial ALBs and initial default target groups is beyond the scope of this post so I’ll assume that part has been completed. Next we’ll go into detail how the remaining four steps work:

Master and Feature Target Groups

Going back to our purpose, we want to run more than one version of API-B at the same time in an environment, and be explicit about when requests are routed to the feature-branch version of the service. To that end we need to actually have two versions of the service running:

If your microservices run on ECS, this means having two ECS services for the same conceptual microservice. To help distinguish between the versions, I recommend a naming convention that includes the branch name (or “master” for the mainline version)
If your microservices are deployed on EC2s, use whatever your deployment pipeline looks like to deploy a second version of the same service, except built from the feature branch.

Once a second version of the service is deployed, you can set up a second target group. This should be done the same way you set up the target group for the mainline/master version of the service, but using the feature-branch deployment as the targets.

Default Listener Rules

Default listener rules serve an important purpose: they serve traffic to the master target groups when there are no feature-branch headers present. This will most likely be the vast majority of traffic to your application. If you use an ALB for each service, then the default rule that you cannot delete from the ALB is perfect for this, you can have it route traffic to the master target group. If you use a single ALB for many services, you need to add a master target group rule for each service, most likely using the host header to determine with service’s master target group to route to.

What the default listener rule looks like

Feature Branch Listener Rules

This is where the magic happens! We’re going to pick a convention for our feature-branch header. Let’s call it mycomp_branch. For any requests that have the header mycomp_branch set, we are going to use the value of the header to create a rule that routes traffic to our feature-branch target group. Let’s say our one feature branch is called experimental-optimization. So now we create a rule that routes traffic to the experimental-optimization target group whenever mycomp_branch is set to experimental-optimization:

Keep in mind, if you are using a single ALB for all services, you’ll need to create feature-branch rules that have a condition for the Host header (to pick the service) AND a condition for the mycomp_branch header (to pick the right version). I’ll also note here that there’s no practical limit on how many feature-branch versions of a service you can run and route with this scheme. You just need to ensure you repeat the steps about setting up an extra target group and new feature-branch listener rules for each additional version you want to run at a time.

Relaying Feature Branch Headers between Services

For this scheme to work, we need the mycomp_branch header value that is passed into any of our microservices to be passed on with the same value to any outgoing calls to other microservices. This is what would allow us to deploy a second version of API-B and not have to change API-A’s code at all, we’d simply have to pass this header into any calls to API-A to ensure that when API-A hits API-B’s load balancer, our new listener rules will take effect.

Setting this up will range from relatively easy to painful depending on your services’ programming language(s) and architectural decisions. If you haven’t already, now would be a good time to create a wrapper library around your inter-service API calls where you can inject this logic in a standard way across services. These are the requirements for any solution that gets implemented:

You need to be able to access the incoming header values at the point in your code where you are about to make an outgoing call to a service. The mechanism for this varies by language/framework. For example in ASP.NET (virtually all variants including Core) you can use HttpContext.Current just about anywhere in the code to access the current API call’s headers. For Node.Js no such convenience exists. You will either need to pass these headers all the way through your code, or you can use https://www.npmjs.com/package/cls-hooked to store header values in a request-specific container during middleware, then retrieve them from that container in your code that makes the outgoing http call (even after callback functions and promise awaits have happened along the way).
You need to take the headers that you’ve collected and ensure they are passed to all outgoing calls to your services. Again, this varies by language and framework. For Node.Js, see the below snippet for an example inspired by the New Relic Node.Js APM agent (https://github.com/newrelic/node-newrelic). This basically wraps the built-in http and https modules to look for the mycomp_branch header value inside a namespace from cls-hooked. The assumption being that you also have middleware that adds the branch header value to that namespace.

Instrumenting all outbound calls with the feature-branch header value

Usage:

Once you have all of the above requirements in place you should be in a position to target feature-branch versions of any service (or more than one service along the request chain) by simply passing the right headers into your public-facing entry points into your microservice network. You would do this in your front ends, where typically a feature-flagging tool would dictate in what scenarios the header should be added. Now you can go from just a single version of each service to one or multiple feature-branch versions accessible at the same time!

More than one feature-branch can be active at a time. Only services that need alternative code for that feature-branch need a separate target group! In the example above, only API-B needs to change for the feature “experimental-optimization” but both API-A and API-B need alternate versions for “feature-x”

Final Notes

Using ALB Advanced Request Routing for service discovery is the option that worked best for ScribbleLive’s current environment. But this may not be the case if you’re not already using ALBs, or instrumenting all your services is not feasible right now. Always evaluate all the options available to you before making a sweeping architectural change such as this. Some other notes:

Be wary and vigilant that the number of active feature-branch versions in play do not grow out of control. This is meant to be facilitate short-term A/B type testing, not permanently branched codebases.
Ensure that you are allowing the mycomp_branch header as an allowed header for CORS purposes in your public-facing entry points, if CORS is part of your use case.