Route53 Latency Based Routing made easy with Serverless

Sebastian
DAZN Engineering
Published in
7 min readMar 11, 2019

Learn how to make your service available worldwide with few easy steps!

TL;DR

GitHub repository: https://github.com/sdomagala/serverless-latency-routing-example

In this example you will deploy two APIs in two AWS regions — eu-central-1 and us-east-1.

Why?

There are multiple ways to handle routing in AWS with Route 53:

Types of Route53 routing policies

If your service operates in many countries/continents, there is fair chance that at some point you will want to go global to lower latency and maybe spread the traffic a bit. Latency routing helps you in a way that it decides how your traffic should be routed to provide lowest latency to users. You also get failover out of the box, but you need to create health checks in a way that ensure your service is really up. In order to do it, you need to perform deep health checks that check not only your service, but also underlying services and infrastructure.

When you should use latency routing:

  • you should provide lowest latency possible
  • all your regions have the same set of data (so it doesn’t matter to which one user is connected)
  • if your service is down in one region you should reroute user to other AWS region that has second-lowest latency (which doesn’t always mean the nearest geographical region)

When you should NOT use latency routing:

  • let’s say, your company needs to ensure that all user data needs to be processed in the region user is currently in, for example, user in Europe must use eu-central-1 region because of local regulations — geolocation routing can help you with that
  • your service runs mostly in the set of countries, so you want users to use one region exclusively, and the other one in case something bad happens in the “main” region — in this case you should use failover routing
  • your service works under heavy load and you have to ensure that traffic is spread evenly across all territories to even out the load, rather than risking having one region taking the big hit — for this, use weighted routing

How?

In this example we are going to create a simple lambda function attached to an API Gateway, which is then connected to Route 53 using Custom Domain.

Architecture of this example:

Multi-region latency routing

Let’s get our hands dirty

This tutorial does not cover creation of Route53 Hosted Zone/ACM Certificate as those resources should not be created for one service, but rather be shared

First, let’s create lambda that’s going to return current region

This function will only show region we are currently using

Then, create simple health check that’s going to return state of your service in this region — we are going to break it later:

Now that we have our two lambda functions ready, let’s create serverless.yml file

There is couple of things worth noting:

  • endpointType: REGIONAL means that your API Gateway will not be edge-optimized. Edge-optimized API Gateway creates implicit CloudFront distribution under-the-hood that you can’t modify, so it only gives you ability to use CloudFront edge-locations (which might be all you need in some cases), but if you need more CloudFront features, then it’s better to create one yourself that utilizes your Regional API Gateway (more here).
  • we created two function definitions — /health and /region — that are going to be available under one API Gateway
  • resources for Route53 Latency Routing are under ./cloudformation/routing.yml to keep serverless.yml clean

Now we have lambda functions, whole infrastructure, and the only thing missing is to prepare Route53 records, so let’s create routing.yml file in cloudformation directory

First, to run it you are going to need 4 environment variables:

  • ACCOUNT_ID — id of your account, probably there is a way to get it using CloudFormation but I couldn’t find it after quick research — let me know if you know a way to populate it!
  • ROUTE53_NAME — domain your service is going to be available under, for example, sls-latency.example.com (without http/https)
  • ROUTE53_HOSTED_ZONE_ID — id of your Route53 hosted zone that matches your domain
  • DOMAIN_CERT — id of your ACM certificate that matches your domain

This is the place where it gets interesting, so let’s go through it resource by resource:

  • CustomDomain: to attach Route53 record to your API Gateway, first, you have to generate CNAME record — it also applies if you want to attach domain that’s not available on your AWS account. You basically let API Gateway know that this service is going to be available under different domain, for example sls-latency.example.com. RegionalCertificateArn has to match this domain, for example it needs to be generated for *.example.com. It also needs to match the region — if you create your service in three different regions you have to import this certificate to each of those.
  • ApiGatewayDeployment: this resource is automatically created by Serverless BUT in order to add your API to Route53 we need to override its basic behaviour - your lambda functions need to be ready to serve traffic and your custom domain have to exist to consider API Gateway deployment as done. That’s why clause DependsOn is added — without it, first execution of your CloudFormation stack would fail, because it would try to attach your API Gateway to Route53 when it’s not ready yet.
  • BasePathMappingV1: this resource is not a must-have but you probably want your API to be versioned, with this command we don’t need to update API Gateway in any way, it will be available under https://sls-latency.example.com/v1/
  • Route53HealthCheck: with latency routing you have to provide endpoint that’s going to be used as health check but also to measure latency in your endpoints. Out of the box, AWS every 30 seconds will hit this endpoint from predefined list of regions (you can override it) and will check what is the latency, and if this API is up and running — that’s why you should have deep health checks. After two failed requests, so after one minute, your traffic is going to be rerouted to the nearest region that’s up and with next lowest latency.
  • Route53Record: this is the place where it all comes together. We are adding actual record to Route53 hosted zone with the name of your custom domain. TTL is set to 60s because your client after every 60 seconds should check again if region didn’t change in case the one currently used is down.

Your Route53 record does not have to be unique, in latency and failover routing, only pair name + region have to be unique

Let’s try it

Ultimately, this should be done by your CI tool, but let’s create simple deploy script:

After build passed you should see something like this in your Route53 Hosted Zone:

And your health checks should work as well:

Then, when going to “Health checkers” you will see that AWS tries many different regions to define best latency for all locations:

Health checkers in all major regions

Health check works as expected, let’s now check /region endpoint. After accessing https://sls-latency.{your domain}/v1/region you should see that your nearest region responded, for me it’s

Default routing

As expected, since I’m based in Europe, AWS responded with eu-central-1 region.

Let’s now try to break it, change your /health endpoint in code or in AWS console to return status: 500 and see what happens:

Routing changed to us-east-1

After up to 2 minutes (1 minute for health checks to pick up issue, and 1 minute for Route53 TTL to timeout) you will see that your other region took over the traffic.

Also, your health checks are going to start showing this region as unhealthy.

That’s all folks!

This is my take on creating Latency Routing with Serverless, if you have any questions, suggestions for improvement or it helped you, let me know with applaud or in comment section!

--

--