For a successful business, it is important to know who your customers are. Knowing your customers helps to understand your business better, build better versions of your ideas, and can make effective decisions for the successful long run. Everyone knows it right!
When it comes to software infrastructure, it remains important. Knowing your customers greatly improves the performance of your application, makes it more resilient, after all, no one likes to wait for the response :)
Okay getting to the point, if your services are being used by a wide range of audiences across the globe, but your servers are located within a single region, then your users from other parts of the world are going to experience high latency resulting in bad user experience. Several case studies are proving why faster response time is very important for business,
- Why Performance Matters
- Mobile Site Speed Performance,
- Google, Amazon, and many other big software giants confirm several milliseconds(ms) increase in latency can degrade user engagement greatly.
Let's go for an example, GCP ping is a simple website which makes a ping request to servers located in various regions, and show result ordered by low latency. It looks like this,
Since am currently living in the Netherlands and I get the lowest latency when I connect to servers from europe-west4 region, on the other hand,
IOWA is (5x slower), Hong Kong is (15x slower) and Mumbai is (20x slower)
That is a very huge difference, that's why many application goes for CDN services to serve static content faster to the user from the nearest region. But more dynamic calls such as APIs loses these benefits. Also going multi-regional makes your application more resilient, since losing a region won’t stop your service.
Another reason to have multi-region can actually be a business requirement since some business needs to store data in specific regions.
Multi-regional is great for business, so what's stopping us? Going multi-regional is not an easy task and requires a lot of engineering effort. In this article, we will look at various challenges when using Google Cloud Platform as a cloud provider and wants to go multi-regional.
The first thing we need to do is, make our services available in various regional servers, let's see what options we got,
AppEngine is really great, I have used it for several years it's simple, fast and you don't have to work about scaling, in fact, its one of the initial version of serverless before it even became the Buzzword.
But the problem with AppEngine is it's bound to a single region, and it's locked to the project level. So if you need to have multiple regions, you need to have multiple projects, which is bad since a project has many other dependent services like Datastore, Memcache, Storage, etc sharing those is not feasible. In short, we can’t achieve multi-regional with AppEngine.
Compute Engine is an Infrastructure as a Service, and fully support Multi-region. Since Google Load Balancer can route traffic to multiple Compute Engine servers (each created in multiple regions). But this means you should also manage the machine yourself, which sounds old and no longer a good approach.
GKE clusters are still regional and it requires a lot of effort to make it multi-regional since support for multi-regional is still evolving, and there are articles describe a temporary solution to this using kubemci, Setting up a multi-cluster Ingress. Also, GKE is a lot of several simple use cases.
So options for making application multi-regional are limited.
Google Cloud Run (Serverless)
Cloud Run is a managed compute platform that automatically scales your stateless containers. Cloud Run is serverless: it abstracts away all infrastructure management, so you can focus on what matters most — building great applications.
Cloud run is simple, and it's container-based which makes it more cloud-agnostic if you haven’t checked it before I highly recommend to take a look.
Let's look at a simple approach to go multi-regional using Cloud Run. One of the coolest features of the Cloud Run is that within a single project you can create multiple services each in a different region.
What this means is that you can deploy the same application docker image into multiple regions within the same project.
Let's go for an example, you can check out the complete code in Github Repo. Here I have a simple Node.JS application,
It has two endpoints, one for the message and one for a ping. And the Dockerfile for building the image,
And I made a simple Makefile, to build the docker image and upload it to google container repository, then deploy the image to multiple cloud run services (each with the different region),
deploy-asia tasks will call
gcloud beta run deploy command with various regions to deploy the services, and the result is your app now has servers in multiple regions.
We are not done yet, we just did the first part of deploying your services in several regions. Let's move on.
Routing Traffic to the nearest region
This is the actual problem, we need to route traffic from users to the regional server which is closer to them, eg: user from
europe should be routed to, in our case
europe-west1. Let's see our options for this,
The Global Load Balancer offers a Single global IP address and can route traffic to the nearest regions using routing rules, but unfortunately as of this writing, it only supports Compute Engine VM as the supported backend to route traffic. But it would be a game-changer if it can support other backends like Cloud Run. But that is something for the future, let's move forward with the present.
Third-Party Proxy Service:
So there is no internal solution from Google Cloud Platform to solve this. We needed a proxy server which takes all request and routes to specific regions. eg: HA Proxy, Nginx or Apigee.
Apigee is an API gateway server that acts as a proxy and can do several things like authentication, routing, mapping, monitoring, etc. So we can set up a routing rule to route traffic from a specific region to point to the corresponding cloud run service URL.
So, in general, we need a new URL (which points to the proxy server), which routes traffic to individual service URLs.
These solutions are all about server-side routing, offering a single Endpoint for clients to consume.
As the name implies, this strategy involves the client application to make one initial request the service registry server, which can determine the user region using the IP address or locale and returns the appropriate service URL. From then on the client can make a direct request to the service without routing via any proxy server.
In our example, we can set up an intermediate server, which can take user IP address or client can send their region information and returns the Cloud Run service URL for a specific region.
Client-side Service Discovery (using ping):
A slightly different & simpler version of the above solution is,
- The client makes a request to Service Registry
- Service Registry returns a set of Service URLs (eg: europe.server.com, us.server.com)
- The client will make ping request individual servers to find the one with the low latency
- From then on, the client will use the low latency server as the default one for all communications.
Which is also a good idea, since the server doesn’t have to do complex operations to determine the user region, several gamming applications use this technique. A sample JS code to find low latency server,
And for our demo cloud run services, when tried from Netherlands location, this is how my result would look like, similar to when tried using gcping.com
But the same code when tried from another geographical location like India, it would give a different result,
NOTE: the same approach can be used with GKE clusters created in multiple regions as well.
Software applications that have a consumer base from multiple geographical locations, it is worth going for a multi-regional approach, in this article we have covered some use cases that will help better to understand & decide which strategy fits the needs. Other than providing a fast response time to users, the multi-region approach also offers,
- A Resilient application that continues to work even when a region goes down.
- Can perform A/B testing by only deploying to certain regions
- Supports business requirements