Kubernetes: Exploring Istio for event-driven architectures

Todd Kaplinger
IBM Cloud
Published in
10 min readJan 3, 2018
Obtained from: https://latimesphoto.files.wordpress.com/2011/10/la-1020-pin01.jpg

As 2017 wanes and we start to embark on the next set of challenges around cloud native architectures, we are starting to see major cloud providers all come to the same conclusion around containerization and the role that Kubernetes plays in this ecosystem.

The community has declared Kubernetes as the orchestration technology that will allow developers to deploy seamlessly across the various cloud providers. I think we can all agree that Kubernetes is great, and pretty much any tech question can be answered by responding “ Kubernetes.”

However, if we dig a bit deeper, we will find that our next challenge will be around how to leverage Kubernetes to orchestrate the deployment of cloud native capabilities, at enterprise scale, in this container world following a “ microservices” based architecture. This article discusses how you can modify existing Kubernetes-based microservices applications to use Istio to manage communication.

Microservices

First, let’s review where “microservices” came from and what it means.

The term “microservice” was discussed at a workshop of software architects near Venice in May, 2011 to describe what the participants saw as a common architectural style that many of them had been recently exploring. In May 2012, the same group decided on “microservices” as the most appropriate name. James presented some of these ideas as a case study in March 2012 at 33rd Degree in Krakow in Microservices — Java, the Unix Way as did Fred Georgeabout the same time. Adrian Cockcroft at Netflix, describing this approach as “fine grained SOA” was pioneering the style at web scale as were many of the others mentioned in this article — Joe Walnes, Dan North, Evan Botcher and Graham Tackley.

I consider myself a self-proclaimed early adopter of microservices based architectures. Around late 2014, I led a team of technologists to build out a highly scalable cloud architecture that adhered to the principles of The Twelve-Factor App. Since the cloud was still in its infancy and Docker was just starting to emerge for production usage, we based our architecture on Cloud Foundry, which has its own concept of containerization and allowed us to move to a DevOps culture that promoted a release early and release often mindset.

We documented this project in the Transitioning to a cloud-centric architecture blog post to educate teams on how we approached this endeavor. Later, the team moved the project from Cloud Foundry to containers. This process is documented in the Migrate your Cloud Foundry applications to containers on Bluemix recipe.

In the spirit of these projects, I will outline how I could use Istio to further the evolution of this application. What I outline below is definitely achievable.

Our Application

Before we get deeper into the article, let’s review this application. In this article, we learned what a Twelve-Factor App is and what are the main tenets of this class of application. The application we are going to dive into was a collection of Node.js microservices consisting of a combination of HTTP and Messaging endpoints. Each one of these applications served one purpose and were all loosely coupled to enable our developers to deliver new capabilities quickly without impacting the remaining architecture. The use case for this application is IoT (Internet of Things), where the vast majority of the APIs calls being executed are the results of events from a device where the events represented the detection of mobile devices and their location (indoor and outdoor device movements). This event-driven architecture supported a fire-and-forget execution model where the client was not blocked by waiting for the response from the server. To achieve cloud scale, we leveraged a messaging backbone to distribute messages across the system to various services listening on the queue. The image below depicts the deployment architecture and the data flows through the system.

https://cdn-images-1.medium.com/max/2000/0*Pm-8j5OUZgjAaIzy.png

Exploring Service Mesh Capabilities

While the general architecture and approach of this application is quite solid, there is an opportunity to inject a Service Mesh technology into the topology that will really bring this application to 2018 levels. A Service Mesh is a dedicated infrastructure layer for handling service-to-service communication. It’s responsible for the reliable delivery of requests through the network that microservices rely upon to be a cloud native application. By leveraging a Service Mesh, developers will not be responsible for implementing complex concepts such as request routing and failover that most Service Meshes handle natively. For my scenario, I’ve selected Istio as the Service Mesh platform of choice. Here’s now Istio describes itself in its About page:

Istio is an open platform that provides a uniform way to connect, manage, and secure microservices. Istio supports managing traffic flows between microservices, enforcing access policies, and aggregating telemetry data, all without requiring changes to the microservice code.

For deep detail about what Istio is, I suggest that you read the Istio Concepts section of its documentation. To me, Service Meshes such as Istio allow me to reuse common components and routing technologies seamlessly in my application without having to explicitly change my application code and logic to enable or introduce addition components such as http proxies to my topology just to route API calls. This seamless injection will be one of the key characteristics that I will outline in the section below.

For a relatively new project, the documentation for Istio is quite good. In the following sections, I will touch upon a selection of key features and share links to additional sections of the documentation I find useful.

Synchronous APIs

The initial area where I felt Istio could be leveraged in the improved architecture was around our synchronous API calls. While the majority of the app’s calls were initiated as part of IoT devices, there were instances where we supported API calls for things such as management and configuration APIs. These calls retrieved data from its persistent storage systems, such as Cloudant, Elasticsearch, and Redis, but we did not want to support blocking HTTP calls in our messaging flows. Without Istio, I would have had to integrate a solution such as Netflix OSS or write my own libraries to handle request routing, load balancing, error handling, and a variety of other functions. Instead, I can use Istio for all service-to-service communication that support the HTTP transport protocols via Pilot component. Istio’s Pilot exposes APIs for service discovery, dynamic updates to load balancing pools which will simplify my request flows to the various management and configuration APIs. By starting with Pilot, my application can incrementally add additional features of Istio that I will describe in the following sections.

Transparent Injection

Istio depends on Envoy (from Lyft) as sidecar container for request routing and metrics collection. For applications wanting to leverage Istio, the application deployment YAML needs to be updated to include the envoy sidecar before deployment which will deploy the Istio components as sidecars to your deployment co-located within the same pod. Istio supports transparent injection, which means that you don’t need to call `istioctl kube-inject` to inject the Envoy sidecar container but need to only deploy a sidecar initializer that can help inject the Envoy sidecar container automatically. This makes the microservice management with Istio simpler as there will be no change to the application code. As a result, developers can deploy applications as usual but with the added benefit of being managed by Istio Service Mesh.

Request Routing (Version Aware)

Another pain point around APIs is versioning of APIs and management of said APIs. Since the microservice application is multi-tenant by design, I wanted to be able to release new function while maintaining backwards compatibility for previous APIs. By implementing version aware routing, I can roll out new APIs to some of our services while maintaining the original API for consumers, such as client side SDKs that might not be able to update their code base as iteratively as our internal services. By applying the appropriate request policies by leveraging request headers, I can route the traffic to the appropriate version supported by our client.

Istio provides a great overview of request routing rules in one of their traffic management documentation.

Canary Deployments

As the team deployed new versions of the code, we typically had a high level of confidence in what we were delivering was well tested from both a performance and regression perspective. However, I want a simple way to test out new function in limited roll outs in a manner to not impact our larger customer base. Using Canary Deployments, I can incrementally roll out new services and using tools, such as New Relic, monitor the current performance against historical benchmarks ensuring we maintain the appropriate SLAs for our services.

Istio provides a great overview of Canary Deployments in one of their featured blogs.

Distributed Tracing

When developing our microservices, we spent a good amount of time building out libraries that enabled us to get visibility across our entire system. We instrumented our code using a combination of Node’s Express Library, New Relic agents, and generating unique request IDs for each inbound request to correlate data across the system. Istio provides support for both Jaeger and Zipkin, which allows me to correlate our unique metrics with the logging data generated by Istio and Kubernetes to provide a holistic view of the requests to debug problems. With distributed tracing, I could also easily locate the performance issues in the call chain.

https://istio.io/docs/tasks/telemetry/img/jaeger_trace.png

Istio provides a great overview of Distributed Tracing using their BookInfo application which highlights some of the tracing capabilities built into Istio.

Metrics and Logs

In the preceding sections, I talked about version aware routing, canary deployments, and tracing of the overall system. For the overall operation of a healthy deployment, DevOps practitioners will be adamant about gaining insights into the live running system. Istio provides integrations with both Grafana and Prometheus with predefined views of key metrics mostly commonly used by operations teams. In the image below, the metrics range from response time to error counts to drill downs per service. These views are even customizable to add in your own metrics. If I combine these capabilities with what we already have with New Relic, I can quickly gain insights into the health and performance of the entire ecosystem.

https://istio.io/docs/tasks/telemetry/img/dashboard-with-traffic.png

Istio provides a great overview of Visualizing Metrics using their BookInfo application which highlights some of the Grafana views built using metrics captured by Istio.

Service Graph

Besides distributed tracing, I can also use Istio Service Graph to generate a web-based interface for viewing service graph of the service mesh, which can enable the end user to view the topology of the service mesh. It is a very important feature if you have large amounts of micro services in your cluster.

https://istio.io/docs/tasks/telemetry/img/servicegraph-example.png

The service graph depends upon Metrics and Logs and building on top of Prometheus queries.

Transport Security

Enterprise Security requirements block many deployments and one of the most common security escapes is the lack of secured transports for data in motion. Istio solves this concern by providing support for mutual TLS out of the box between Istio endpoints by deploying sidecars along with their applications. When mutual TLS is enabled and the Istio sidecar is deployed along with our application, all communication between the two microservices endpoints is secured across pods as the sidecar intercepts all inbound and outbound traffic for Kubernetes pods that are leveraging Istio. This is an absolute life saver for developers not wanting to mess with SSL certificates and management the lifecycle of those certificates.

You can not only use Transport Layer Security for app to app communication, but you can also extend it to services such as Etcd, as described in the Medium article Istio is not just for microservices.

Messaging Flows

While Istio provides excellent support in other areas, there are components of the architecture I am not be able to move to Istio: messaging and the usage of AMQP for publishing and subscribing to topics. We leverage many core capabilities of the AMQP protocol such as wildcard pattern matching, exchange types such as fanout, as well as concepts such as store and forward in case endpoints are not available. Istio has plans to support additional protocols such as AMQP in the future but for now, the idea of moving from a message based approach to an HTTP centric approach with Istio seems a bit premature.

Conclusion

2017 is considered the year of Kubernetes by many. There is a lot of momentum around Service Mesh as the technology for 2018 that will continue to drive Kubernetes adoption with enterprise developers. The community is actively engaged around many Service Mesh projects such as Istio, Envoy, Linkerd, and the newest entry into the space, Conduit. There will be some time before the market determines how these fit in the overall ecosystem, but based on the convergence of the various cloud providers around Kubernetes, I expect to see the community drive one or two as the ultimate winners. Based on my examination of how Istio can improve an existing 12-factor app, I think it’s a strong contender in this race. I’m looking forward to seeing how 2018 evolves and what challenges lie ahead.

Acknowledgements

I wanted to take a moment and thank Guangya Liu and Kathryn Alexander for their contributions to this article. Guanya provided excellent content especially around concepts such as Transparent Injection and Service Graph. Guangya is one of my team members and has been driving a lot of the Istio initiatives inside of IBM. Kathryn leads our Content Development team and works tirelessly to cobble our technical jargon into human readable prose. I could not have delivered this article without their guidance.

Originally published at https://www.linkedin.com on January 3, 2018.

--

--

Todd Kaplinger
IBM Cloud

Vice President SW Engineering — Chief Architect, Retail Solutions@NCR Voyix. The opinions expressed here are my own. Follow me on Twitter @todkap