Managing Microservices with Kubernetes and Istio
(Original author of the story: Peer Müller)
In a distributed microservice architecture it can become harder to understand and manage the network of services, as it grows in size and complexity. Monitoring and aspects like A/B testing, canary releases, access control and end-to-end authentication are often operational requirements. The term service mesh is used to describe a network of microservices and the interactions between them.
This article aims to give a general overview on what a service mesh is and how it can be implemented. Then it will show how to manage traffic, inject faults and monitor services with Istio and Kubernetes using a simple example application.
Service Mesh Overview
A service mesh is a communication layer that rides on top of request/response unlocking some patterns essential for healthy microservices:
- Zero-trust security that doesn’t assume a trusted perimeter.
- Tracing that shows you how and why every microservice talked to another microservice.
- Fault injection and tolerance that lets you experimentally verify the resilience of your application.
- Advanced routing that lets you do things like A/B testing, rapid versioning and deployment and request shadowing.
This communication layer can live in different locations:
- In a Library that your microservices applications import and use.
- In a Node Agent or daemon that services all of the containers on a particular node/machine.
- In a Sidecar container that runs alongside your application container.
This definition is taken from the SaaS platform aspenmesh.
Service Mesh implemented using Libraries
In the library approach each microservice application includes a library that implements service mesh features (Hystrix and Ribbon are examples).
An advantage for this approach is that resource allocation for performing the work on behalf of the microservice is handled by the OS as the code is actually running inside the microservice. Another advantage is that is doesn’t require much cooperation from the underlying infrastructure, i.e. the container runner does not need to be aware that you are running a Hystrix-enhanced microservice.
A major disadvantage is that the libraries need to be ported to different languages in order to support them which produces effort in replicating the same behaviour over and over again.
Service Mesh implemented with Node Agents
In the node agent model there is a separate agent running on every node that services all the different microservice tenants on that particular node. This works similar to Kubernetes’ default kube-proxy which serves all pods on a node.
As a result this approach allows servicing heterogenous applications written in different languages which additionally allows efficient resource usage.
Contrary to the library approach this deployment requires some cooperation from the infrastructure. Applications need to delegate their network calls to the agent.
Service Mesh implemented with Sidecars
In a sidecar deployment for every application container there is an adjacent container deployed (the “sidecar”) which handles all network traffic in and out of the application. This is the model used by Istio with Envoy Proxy. The approach acts as a tradeoff between the two previously discussed approaches. For instance, you can deploy a sidecar service mesh without having to run a new agent on every node (so you don’t need infrastructure-wide cooperation to deploy that shared agent), but you’ll be running multiple copies of an identical sidecar.
The disadvantage of having slightly more overhead and resource consumption compared to the node agent approach is compensated with the benefit that App-to-sidecar communication is easier to secure than app-to-agent and that it can be gradually adapted to to an existing cluster without central coordination.
Istio Architecture and Features
Istio’s provides the following core features across a network of services.
Fine-grained control of traffic behaviour with routing rules, retries, failovers, and fault injection. Configuration of service-level properties like circuit breakers, timeouts, and retries, allows to set up important tasks like A/B testing, canary rollouts, and staged rollouts with percentage-based traffic splits.
Secure service-to-service communication in a cluster with strong identity-based authentication and authorization.
Automatic metrics, logs, and traces for all traffic within a cluster, including cluster ingress and egress.
Istio currently supports deployment on Kubernetes, Consul and services running on individual virtual machines.
An Istio service mesh is logically split into a data plane and a control plane.
- The data plane is composed of a set of intelligent proxies (Envoy Proxy) deployed as sidecars. These proxies mediate and control all network communication between microservices along with Mixer, a general-purpose policy and telemetry hub.
- The control plane manages and configures the proxies to route traffic, enforce policies and collect telemetry.
Pilot provides service discovery for the Envoy sidecars and converts high level routing rules into Envoy-specific configurations and propagates them to the sidecars.
Citadel is responsible to provide service-to-service and end-user authentication.
Mixer is responsible for providing policy controls and telemetry collection.
Sample application running on Kubernetes with Istio
To get some hands-on experience with Istio the sample Bookinfo Application running on kubernetes was used to try some of the traffic management and fault injection features. But first Istio needed to be downloaded
curl -L https://git.io/getLatestIstio | sh -
and installed on kubernetes (for simplicity without mutual TLS authentication).
kubectl apply -f install/kubernetes/helm/istio/templates/crds.yaml
kubectl apply -f install/kubernetes/istio-demo.yaml
This will install a number of services and pods in a new namespace called “istio-system”.
The Bookinfo application is broken into four separate microservices:
- The productpage microservice calls the details and reviews microservices to populate the page.
- The details microservice contains book information.
- The reviews microservice contains book reviews. It also calls the ratings microservice. (v1 doesn’t call the ratings service, v2 calls the service and displays black stars, v3 also but displays red stars)
- The ratings microservice contains book ranking information that accompanies a book review.
Bringing up the application containers.
#label the default namespace with istio-injection=enabled to allow automatic sidecar injection
kubectl label namespace default istio-injection=enabled
#deploy the application
kubectl apply -f samples/bookinfo/platform/kube/bookinfo.yaml
#Define destination rule to allow routing of available versions with Istio
kubectl apply -f samples/bookinfo/networking/destination-rule-all.yaml
When pointing the browser to application URL the application’s main page can be seen and refreshing it reveals the different application versions.
Main view of Bookinfo application without ratings (v1), with black star ratings (v2) and red star ratings (v3)
Configuring Request Routing
To route to one version only, apply a virtual service that sets the default version for a microservice:
To apply the virtual service run the following command
kubectl apply -f samples/bookinfo/networking/virtual-service-all-v1.yaml
Now the bookinfo application will display ratings without stars since it is configured to only use version 1.
Next, the route configuration can be changed so that all traffic from a specific user is routed to a specific version.
This example is enabled by the fact that the productpage service adds a custom end-user header to all outbound HTTP requests to the reviews service.
When logging in as user “jason” we will see the star ratings next to each review.
Main view shows black stars for logged in user “jason”
In conclusion we configured Istio to route 100% traffic to version v1 of the Bookinfo services and then set a rule to selectively send traffic to v2 of the reviews service based on a custom end-user header.
To test microservices for resiliency delays and aborts can be injected to simulate a faulty or overloaded service. In our case we will inject a 7s delay for the user “jason”.
Now we expect the home page to load without errors in about 7 seconds. However, we uncover an error in the reviews section which displays an error message.
This is due to the productpage application’s failure handling — which still needs to happen despite Istio’s out-of-the-box failure recovery. The productpage times out prematurely and throws an error.
In this case the fault injection helped us to reveal such an anomaly without affecting end users.
After fixing the bug and deploying the new version of the application we want to shift the traffic to the fixed version.
At first we shift 50% of the traffic to the review service v3.
After a while we may assume the service is stable and can route 100% of the traffic to v3.
Collecting and Visualizing Metrics
To visualize the data an addon called Servicegraph can be used to show how the services are connected.
Servicegraph view of Bookinfo Application shows the services interdepedencies
Grafana allows to visualize the metrics using several views and dashboards.
A Dashboard from Grafana that visualizes the requests latency and fail rate
Istio support various means to authenticate services and end users, authorization (Role-based Access Control) to control services in a service mesh as well as auditing tools. A discussion of these topics would go beyond the scope of this article you can refer to the official documentation.
To wrap things up in this article we discovered what a service mesh is and how it can be implemented. It also showed how to install Istio on Kubernetes and deploy a sample application. The Bookinfo sample application demonstrated how to manage traffic to different versions based on a user property and weights. Additionally, we learned how to inject faults to uncover potential flaws in the microservice interaction in order to increase system resiliency. Introductions into the collection and visualization of metrics as well as security aspects were also given.
As the next step we want to deploy Istio to a real world application to see how Istio works in practice and if it can live up to the expectations we have gathered when writing this article.
Credits for cover image go to: AMIS TECHNOLOGY BLOG