Deploying an app to a decentralized service mesh with Anthos and Istio

Automating cross-network service-to-service communication

15 min readMay 20, 2020

There’s a lot going on in today’s infrastructure ecosystem. Devops and the cloud are radically changing the way applications are deployed. Multi-cloud is one of the latest buzzwords everyone wants to jump in and there are several technologies that can help you achieve that. In this article, we’re going to take a look at Anthos, Google’s entry into hybrid and multi-cloud enablers.

Google calls Anthos a “modernization framework”, and even though that may sound a little unclear at first, it most definitely is. The framework part of Anthos may be the most interesting one, Anthos is not a single binary, or a single tool, but rather a set of tools and practices that enables and facilitates the creation and management of hybrid and multicloud application architectures. Two of those tools are Istio and Anthos Config Management, and we’re going to take a look at both in this post.

In further posts we’re going to take a look at more of what that framework consists of, including GKE-On Prem, that allows you to run Google certified and managed versions of Kubernetes on your own Datacenters, several observability and management features like Anthos Service Mesh to be able to monitor the topology and health of your mesh, and Connect, that enables you to manage all of your Kubernetes clusters from a single pane of glass using the Google Cloud Console Kubernetes Engine dashboard.

Note that Anthos is not open source, and an active entitlement from Google is required to use it.

We’ll go through the steps needed to deploy an application across an Istio distributed service mesh with replicated control planes using Anthos Config Management.

Istio provides two ways of creating distributed meshes: Having a shared control (management) plane between them, or having separate but replicated control planes. Both architectures have their pros and cons, but for the purpose of this post, we’re going to be using a replicated control plane since that avoids making Istio’s control plane a single point of failure of our design.

We’ll do this by connecting services running in two Kubernetes clusters mounted on isolated networks using Istio gateways to secure communication between them using mTLS. This will enable our services running in each network to establish a secure link without the need of configuring VPNs or other means of interconnection.

Why would this be a good idea? Imagine you have certain workloads running on your On-Prem datacenter that can not be moved to the cloud for any given reason, like, say, a government restriction on where PII information or customer data can be stored: Creating a distributed mesh that spans both your cloud provider and you On-Prem datacenter enables you to leverage the best of both worlds. Modernization is another reason to do this: By creating a mesh across your On-Site and cloud infrastructure, workloads can start to be modernized and moved to the cloud when they’re ready, little by little. If you can come up with any scenario where a hybrid or multi-cloud approach could be beneficial, using Anthos and Istio to handle communications and routing would help you achieve your goals.

For this example, I’m going to be using clusters running in Google Kubernetes Engine on two separate VPCs, but since Anthos Config Management works on any Kubernetes cluster, it’s easy to see how this would enable centralized management of your clusters that run across different cloud providers and on-premises.

Anthos Config Management provides a single management interface and deployment pipeline for every cluster you enroll by using a Git Repository to hold your Kubernetes infrastructure’s code and distributing it according to policies defined for each cluster. You can apply the same set of configurations across all your clusters or use special selectors to create asymmetric deployments. Think of it like some kind of Puppet that enforces a desired configuration state in the resources it manages, being those resources your Kubernetes clusters. Every object that’s being managed by ACM, is going to be synced with the desired state that’s held in the respository. You can use this to add security policies that need to be deployed on every cluster, or make sure that some group in you organization (like your SREs) have the roles thet need accross your environment, but you can also use to enforce that certain deployments or stateful sets or any other Kubernetes object is present with a specific configuration. ACM adds another layer of abstraction on top of the configuration management of Kubernetes by allowing you to delaratively state the configuration of any object and monitoring each of them to ensure they’re compliant.

Here’s a high level overview of Anthos Config Management’s architecture:

If you want to dig deeper into Anthos Config Management’s architecture and inner workings, here’s a great post from Rafael Alvarez that covers that ground in more detail.

As you can see in the picture above, Anthos Config Management basically consists of a Git repository where you infrastructure code is going to be kept, and an agent component that gets deployed in your Kubernetes clusters. This agent services that get deployed to your clusters are responsible for keeping your cluster state in sync with the definition held in the repo.

The sync services authenticate against the repository and pull YAML (or JSON) files containing definitions for your Kubernetes resources from a previously configured branch and directory. There’s much more to Anthos than what has been stated here, but this quick introduction will be enough for the use case we’re going to explore.

This article assumes you have some level of working knowledge of Kubernetes and Istio (though you’ll be able to follow along if not), an active Anthos entitlement from Google, and Anthos Config Management and Istio already deployed on your Kubernetes clusters. Istio needs to be installed in multicluster mode with replicated control planes.

If you need some help with the prerequisites stated above, the following links may be of help:

Introducing the use case

The use case we’re going to be exploring is the following:

Pinit is a simple web application that can be used to create a to-do list. This application is in its early stages and currently is only comprised of a single service. Due to its short life, Pinit’s development cycles are pretty fast; new features are rolled out on a weekly basis and these features need to be tested quickly and safely with live production traffic after appropriate QA testing.

To achieve this, Pinit Co. has separate clusters for each of its environments: dev, staging and prod. Each cluster is isolated from the rest in terms of network access. In order to enable the Pinit team to reach their goal of being able to test new features using live traffic in a controlled manner, we’re going to create a distributed service mesh across the staging and prod clusters using Istio, and we’re going to enable canarying between prod and staging with the intention to send 10% of live traffic from end users to the latest version of the app running in the staging cluster to get performance metrics.

Istio is going to take care of appropiate routing of requests and securing communications by enforcing mTLS in calls between clusters to ensure all traffic is encrypted. Requests to the staging service are going to be served via an Istio ingress gateway located in the edge of the staging cluster’s network, so no VPN or any other means of interconnection will be needed. Pretty awesome, right?

With a little imagination, you can start to see how this pattern would apply for clusters across multiple cloud providers, on-prem facilities, isolated projects, and whatever alternative you can come up with, so that you can build geographic and logically distributed meshes with secure communications for your service calls.

This will be our target architecture:

The dotted line represents the logical connection between meshes

Getting to work

For the sake of demonstrating the usage of Anthos CM’s Cluster Selectors we’re going to make both of our clusters pull from the same directory in our GIT repo. In a real production scenario, it would be actually better to create a different directory within our repository for each of our environments and make all clusters belonging to each one pull from those directories.

Here’s the GIT repo for this post:

https://github.com/globant/gcp-sandbox/tree/master/anthos/istio-mesh-sample

And the Docker Hub repo for the PinIt image:

https://hub.docker.com/repository/docker/melkyah/pinit-sample

Initializing the Anthos repository

The first thing we’re going to do is initializing our Anthos Config Management repository. In order to do that, we need to navigate to the target directory of our repo and run the following command:

nomos init

Nomos is Anthos Config Management’s CLI, and it will help us to take a look at the status of our enrolled clusters, check our repo for errors, and more. Access to the Nomos binary file is provided when you get an Anthos entitlement from Google. nomos init will create the basic structure for our config management repository, including the needed system/, cluster/, and namespaces/ directories.

For more information on the Nomos CLI, you can follow this link:

Using the nomos command

After running it, this is how our base directory structure will look like:

base_dir/
├── cluster/
├── clusterregistry/
├── namespaces/
├── system/
│   ├── README.md
│   ├── repo.yaml
└── README.md

Populating the clusterregistry folder

With our repository base structure in place, we’re going to start by creating entries for our Kubernetes clusters in our clusterregistry/ folder.

...
clusterregistry/
├── pinit-staging.yaml
└── pinit-prod.yaml
...

These files are going to label our clusters so that Anthos CM is able to identify them using Cluster Selectors (more on those in a bit). We’re going to populate the new files with the following content:

#pinit-staging.yaml
kind: Cluster
apiVersion: clusterregistry.k8s.io/v1alpha1
metadata:
  name: pinit-staging
  labels:
    env: staging

We’re just creating a Cluster resource and passing it some metadata:

The name of the cluster as it was declared in the Anthos’ config-management.yaml file during the cluster’s enrollment. This may not be the same as the actual cluster name. This is how Anthos CM will identify the cluster.
Some labels we’re going to use to match the cluster against a selector condition. In this case, we’ll just identify it by environment.

We need to create the same Cluster resource for pinit-prod.

Next, we’re going to create the Cluster Selectors we just mentioned by creating this set of files:

...
clusterregistry/
├── pinit-prod.yaml
├── pinit-staging.yaml
├── prod-selector.yaml
└── staging-selector.yaml
...

A Cluster Selector enables us to tell Anthos that some resource needs to go only to a subset of clusters. This is how we’re going to deploy separate configurations to our different environments. Let’s add this content to our new files:

#staging-selector.yaml
kind: ClusterSelector
apiVersion: configmanagement.gke.io/v1
metadata:
  name: staging-selector
spec:
  selector:
    matchLabels:
      env: staging

Here we’re creating the ClusterSelector resource and we’re passing a name in our metadata block, and a selector in our spec block.

The selector is going to match all Cluster resources that have the labels defined within the matchLabels block. In this case, it’s going to match all the clusters that have the env label with staging as its value, so, it’s going to match our pinit-staging cluster.

We need to create the same ClusterSelector resource for prod-selector matching the labels of our prod cluster.

If at this stage we push the changes to our repo we should see our clusters synced to our latest commit. First we push:

git push origin master
...
   b6aa001..2fdf48f  master -> master

And now we can run the nomos status command to check the status of our enrolled clusters:

nomos status
Connecting to clusters...
CurrentContext   Status   Last Synced Token   Sync Branch
--------------   ------   -----------------   -----------
*   pinit-prod   SYNCED   2fdf48f             master pinit-staging   SYNCED   2fdf48f             master

As we can see, our two clusters are synced to our last commit in the master branch. This means our Cluster and ClusterSelector entries are already deployed.

Deploying the application

Let’s go ahead and deploy the application. We’re going to deploy it to a new namespace called pinit-app in each cluster. To do this, we’re going to create the pinit-app folder under the namespaces/ directory, and inside that, we’re going to put a namespace.yaml file, like this:

...
namespaces/
└── pinit-app/
    └── namespace.yaml
...

And we’re going to put this content in that file:

#namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: pinit-app
  labels:
    istio-injection: enabled

We’re adding the istio-injection: true label for Istio to automatically inject envoy service proxies to every pod we deploy.

Next, we’ll create our service account and service resource for our Pinit application. We'll place the necessary files in the same pinit-app/ folder:

...
namespaces/
└── pinit-app/
    ├── namespace.yaml
    ├── service-account.yaml
    └── service.yaml
...

Nothing special here, they’re just ordinary Kubernetes resources.

Now let’s take a look at our deployments. We’re going to create two deployment files, one for each of our prod and staging clusters:

...
namespaces/
└── pinit-app/
    ├── namespace.yaml
    ├── prod-deployment.yaml
    ├── staging-deployment.yaml
    ├── service-account.yaml
    └── service.yaml
...

And dive into our prod-deployment.yaml file’s contents. It’s a regular Kubernetes deployment declaration but note the following section:

#prod-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: pinit-front
  namespace: pinit-app
  labels:
    app: pinit
    version: v1
  annotations:
    configmanagement.gke.io/cluster-selector: prod-selector
...

We’re passing a configmanagement.gke.io/cluster-selector with prod-selector as its value within the annotations block. This is using the Cluster Selectors we created earlier. Anthos CM is going to use this annotations to know that this resource only needs to be deployed to the clusters that match our prod-selector's definition.

We need to set the configmanagement.gke.io/cluster-selector to staging-selector for our staging deployment.

If we push our changes to the repo now, we would see that our application gets deployed to our clusters:

#pinit-staging
kubectl get pods -n pinit-app
NAMESPACE   NAME                         READY  STATUS  RESTARTS AGE
pinit-app   pinit-front-5877696fb4-9fs7s 2/2    Running 0        12m
pinit-app   pinit-front-5877696fb4-gxj4m 2/2    Running 0        12m

And if we take a look at our pods we’re going to verify that each cluster got the right version:

#pinit-staging
kubectl describe pod pinit-front-5877696fb4-9fs7s -n pinit-app
...
Containers:
  pinit-nginx:
    Container ID:
...
      Image: melkyah/pinit-sample:v2
...

You can do the same to check on the prod cluster image version.

Our Cluster Selectors are working! Using them, we’re able to deploy asymmetric configurations to out Kubernetes clusters enrolled in Anthos CM.

Now that our application is deployed, time to create our Istio resources to build our inter-mesh communication.

Communicating between meshes

Let’s start by creating a Service Entry in our prod cluster so that it knows where to find our staging service:

...
namespaces/
└── pinit-app/
    ├── namespace.yaml
    ├── prod-deployment.yaml
    ├── prod-serviceentry.yaml
    ├── staging-deployment.yaml
    ├── service-account.yaml
    └── service.yaml
...

Let’s take a look at some parts of its content:

#prod-serviceentry.yaml
apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
  name: pinit-staging
  annotations:
    configmanagement.gke.io/cluster-selector: prod-selector
  namespace: pinit-app
spec:
  hosts:
  - pinit-front.pinit-app.global
  location: MESH_INTERNAL
  ports:
  - name: http
    number: 80
    protocol: HTTP
  resolution: DNS
  addresses:
  - 240.0.0.2
  endpoints:
  - address: 35.197.77.253
    labels:
      version: v1
    ports:
      http: 15443 # Do not change this port value

We add a cluster-selector annotation to send this resource only to our prod cluster.
In the hosts field, we’re going to pass the name of the remote Kubernetes service in the following format: <service-name>.<namespace>.global. The .global piece is the way Istio knows that this service resides in a remote mesh.
In endpoints.address we pass the IP address of the remote Istio ingress gateway. In this case, of the staging cluster.
endpoints.ports.http: 15443 is set so that the request is made to port 15443 of the remote gateway, this is a special service that will forward our calls to the appropriate pods that run the remote service using mTLS.

With this in place, let’s create our Istio virtualservices:

...
namespaces/
└── pinit-app/
    ├── namespace.yaml
    ├── pinit-gateway.yaml
    ├── prod-deployment.yaml
    ├── prod-serviceentry.yaml
    ├── prod-virtualservice.yaml
    ├── staging-deployment.yaml
    ├── staging-virtualservice.yaml
    ├── service-account.yaml
    └── service.yaml
...

First, our staging service. This one is going to direct all incoming traffic to the local version of pinit:

#staging-virtualservice.yaml
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: pinit-front
  annotations:
    configmanagement.gke.io/cluster-selector: staging-selector
spec:
  hosts:
  - pinit.com
  gateways:
  - pinit-gateway
  http:
  - route:      
    - destination:
        host: pinit-front.pinit-app.svc.cluster.local
        subset: v1

We’re targeting our staging cluster by using a cluster-selector.

And now, its prod counterpart:

#prod-virtualservice.yaml
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: pinit-front
annotations:
    configmanagement.gke.io/cluster-selector: prod-selector
spec:
  hosts:
  - pinit.com
  gateways:
  - pinit-gateway
  http:
  - route:      
    - destination:
        host: pinit-front.pinit-app.global
        subset: v1
      weight: 10
    - destination:
        host: pinit-front.pinit-app.svc.cluster.local
        subset: v1
      weight: 90

This VirtualService will target our prod cluster. But notice there are two destination entries under spec.http.route: This will instruct the prod mesh to redirect 10% of incoming traffic targeting pinit.com to pinit-front.pinit-app.global. Since we created the service entry for that host before, that will resolve to the remote mesh’s gateway address and be routed accordingly. We have started to conjure some magic now! The remaining 90% of requests will be served as usual from our local prod service.

The last piece of this puzzle will be our destination rules. Just to show a different approach, we’ll going to put rules targeted for staging and prod in the same file using selectors. Here’s how our namespaces directory will look like:

...
namespaces/
└── pinit-app/
    ├── namespace.yaml
    ├── pinit-destination-rules.yaml
    ├── pinit-gateway.yaml
    ├── prod-deployment.yaml
    ├── prod-serviceentry.yaml
    ├── prod-virtualservice.yaml
    ├── staging-deployment.yaml
    ├── staging-virtualservice.yaml
    ├── service-account.yaml
    └── service.yaml
...

And how that file would look like:

#pinit-destination-rules.yaml
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: pinit-front
spec:
  host: pinit-front.pinit-app.svc.cluster.local
  trafficPolicy:
    loadBalancer:
      simple: RANDOM
  subsets:
  - name: v1
    labels:
      version: v1
---
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: pinit-front-remote
  annotations:
    configmanagement.gke.io/cluster-selector: prod-selector
spec:
  host: pinit-front.pinit-app.global
  trafficPolicy:
    tls:
      mode: ISTIO_MUTUAL
    loadBalancer:
      simple: RANDOM
  subsets:
  - name: v1
    labels:
      version: v1

We’re creating a DestinationRule pointing to pinit-front.pinit-app.svc.cluster.local in both clusters, given both need to be able to adequately route requests to their respective local service. Below that, we’re creating a second DestinationRule but use a selector to only create it in our prod cluster. This will set client side policies in our prod cluster to inform how our prod service should connect with staging.

The most relevant piece here is its spec.trafficPolicy.tls.mode. Setting this to ISTIO_MUTUAL instructs every service call from prod to pinit-front.pinit-app.global to require mTLS and use Istio’s certificates. This is required in order to be able to communicate with port 15443 in an Istio ingress gateway.

Since every service mesh in an Istio multicluster installation with replicated control planes has a shared root CA, both clusters will present certificates to each other that are signed by the same authority, and allow the communication.

With this in place we are ready to deploy it and test our services!

Let’s have one last look at the state of our Anthos CM code repository’s structure:

pinit/
├── cluster/
│   ├── clusterregistry/
│   ├── pinit-prod.yaml
│   ├── pinit-staging.yaml
│   ├── prod-selector.yaml
│   └── staging-selector.yaml
├── namespaces/
│   └── pinit-app/
│       ├── namespace.yaml
│       ├── pinit-destination-rules.yaml
│       ├── pinit-gateway.yaml
│       ├── prod-deployment.yaml
│       ├── prod-serviceentry.yaml
│       ├── prod-virtualservice.yaml
│       ├── service-account.yaml
│       ├── service.yaml
│       ├── staging-deployment.yaml
│       └── staging-virtualservice.yaml
├── system/
│   ├── README.md
│   ├── repo.yaml
└── README.md

Testing our deployment

Now, if we go to a browser, and navigate to our prod’s service, we should see that 90% of the requests are served by our prod service, and 10% are going to be seamlessly redirected to our staging service that lives in a completely different network without our users noticing the difference. I’m going to use IP and inject the hostname in a header so that it’s clearer we’re hitting the same endpoint:

The request above hit our prod cluster.

10% will be redirected to our staging cluster

But this last request got redirected to our staging service! Not only that, the inter-cluster service-to-service communications are encrypted using mTLS to keep them secure.

We can check the status of authentication policies and destination rules from one of our prod pods to the staging service by running the following command:

istioctl authn tls-check pinit-front-c998bcf74-bz8dd pinit-front.pinit-app.global -n pinit-app
HOST:PORT                                    STATUS   
pinit-front.pinit-app.global:80              OK    
SERVER         CLIENT
PERMISSIVE     ISTIO_MUTUAL
AUTHN POLICY     DESTINATION RULE
/default         pinit-app/pinit-front-remote

Here we can see that from this pod to the remote Pinit service, we’re enforcing ISTIO_MUTUAL authentication on the client side. The STATUS column shows us that there are no conflicts between the policies of our different resources.

So to recap what we took a look at today: We grabbed a small application, and deployed different versions of it across network isolated service meshes using Anthos. Then, we configured the mesh to canary a small amount of production traffic to our latest version running in staging, effectively allowing us to test new service versions in a controlled manner. We could also go further and redirect users to different versions using more sophisticated rules like the presence of certain cookies or headers, enabling us to create pretty interesting and useful flows, like for example redirecting to the staging version only users who are signed up for a beta testing program.

I hope you found this quick run interesting! As always, I’m open to comments and suggestions. See you in the next one!