Control and Data Plane at Schiphol (Image source: KLM)

Managing Microservices with a Service Mesh: Data vs. Control Plane

Marcus Schiesser
Glasnostic

--

When it comes to operating a microservices deployment, there are two core considerations to ask ourselves. First, how will we manage the actual communication between the services? Second, how will we manage the configuration and policies of said communication? If we are using a service mesh, for the former we will use a data plane, for the latter we will use a control plane.

In this post, we are going to compare and contrast the functionality available in the data and control planes of two popular service meshes: Istio and Linkerd. We will also look at the advantages of using a cloud traffic controller like Glasnostic, which approaches the control of service interactions from an operational perspective, irrespective of whether or not a service mesh is used. But before diving into the differences between data and control planes, let’s take a moment to define what a “service mesh” actually is, so we have some context.

What is a Service Mesh?

Service meshes are comprised of two key architectural components, a and a control plane. Contrary to what the name suggests, a service mesh is not a “mesh of services.” It is a mesh of proxies that services can plug into to completely abstract the network away. Service meshes are designed to solve the many challenges developers face when dealing with the flood of remote procedure calls caused by the conversion from a monolith to a microservice architecture. Instead of calling services directly over the network, services call their local proxy, which in turn manages the request on the service’s behalf, thus encapsulating the complexities of the service-to-service exchange.

Although service meshes provide features to extract metrics and to control traffic, their usage is limited when it comes to addressing larger, more complex and emergent behaviors that operators of microservices at scale face.

A typical service mesh architecture with data plane proxies deployed as sidecars and a separate control plane.

For an in-depth exploration about what a service mesh is (and isn’t), check out our “What is a Service Mesh?” post.

Tl;dr-Both the data and control plane are integral components that make up a service mesh. The data plane forwards traffic to other services via sidecars, while the control plane handles configuration, administrative, security and monitoring related functions.

What are Istio and Linkerd?

Both Istio and Linkerd are service meshes.

Istio is an open source service mesh initially developed by Google, IBM and Lyft. The project was announced in May 2017, with its 1.0 version released in July 2018. Istio is built on top of the Envoy proxy, which acts as its data plane. Although it is quite clearly the most popular service mesh available today, it is for all practical purposes only usable with Kubernetes. For more details on Istio, check out our post, “The Kubernetes Service Mesh: A Brief Introduction to Istio.”

Linkerd (rhymes with “chickadee”) is the original service mesh created by Buoyant, which coined the term in 2016. It is the official service mesh project supported by the Cloud-Native Computing Foundation. Like Twitter’s Finagle, on which it was based, Linkerd was originally written in Scala and designed to be deployed on a per-host basis. Criticisms of its comparatively large memory footprint subsequently led to the development of Conduit, a lightweight service mesh specifically for Kubernetes, written in Rust and Go. The Conduit project has since been folded into Linkerd, which relaunched as Linkerd 2.0 in July of 2018. While Linkerd 2.x is currently specific to Kubernetes, Linkerd 1.x can be deployed on a per-node basis, thus making it a more flexible choice where a variety of environments need to be supported. For more details on Linkerd, check out our post, “A Brief Introduction to Linkerd.”

What is a Data Plane?

In a typical service mesh, service deployments are modified to include a dedicated “sidecar” proxy. Instead of calling services directly over the network, each service calls its local sidecar proxy, which in turn encapsulates the complexities of the service-to-service exchange. This interconnected set of proxies (or sidecars) in a service mesh represent its “data plane.”

Istio’s Data Plane

Istio uses its own “Istio-Proxy” as its data plane. Istio-Proxy is a variant of the popular Envoy proxy and therefore written in C++. However, because Istio is designed to be proxy-agnostic, other proxies such as Nginx may be used in theory in place of Envoy. Envoy and Istio-Proxy support HTTP 1.1, HTTP 2, gRPC, and TCP communication between services via its sidecars.

Linkerd’s Data Plane

Linkerd’s data plane is comprised of lightweight proxies written in Rust. These proxies are deployed as sidecar containers alongside each instance of the service. A developer adds a service to the Linkerd service mesh by redeploying the services so that they include a data plane proxy in each pod. In turn, these sidecars will intercept the communication between pods, and enable features like instrumentation and mTLS encryption, plus allowing and denying requests according to the relevant policy that has been applied. Linkerd supports the HTTP, HTTP/2 and TCP protocols between services. The configuration of Linkerd’s proxies is handled by its control plane.

What is a Control Plane?

A control plane at its heart is a configuration server. It is used to control the proxies behavior across the mesh. The control plane is where users specify authentication policies, gather metrics and configure the data plane (i.e. the mesh of proxies) as a whole. The communication between data plane and control plane is defined via an API, for Envoy this is referred to as the Data Plane API.

Istio’s Control Plane

Istio’s control plane is written in Go and made up of the following components:

Configuration: Pilot is the component responsible for configuring the data plane, or more specifically the Envoy proxies. With Pilot, you specify the rules you want to use to route traffic between sidecars, as well as, load balancing, timeouts, retries and circuit breakers. Finally, Pilot also maintains a canonical model of all the services participating in the mesh by making use of its service discovery feature.

Abstraction and Intermediation: Mixer is the component that provides backend abstraction and intermediation. More specifically, it collects traffic metrics and can respond to various queries from the data plane such as authorization, access control or quota checks. Depending on which adapters are enabled, it can also interface with logging and monitoring systems like Prometheus, Datadog and AWS.

Certificate Management: Citadel is the component that allows developers to build zero-trust environments based on service identity rather than network controls. It is responsible for assigning certificates to each service and can also accept external certificate authority keys when needed. Citadel’s features are important in a microservices architecture where “man-in-the-middle” attacks need to be defended against with encryption and in auditing what activities took place when, and by whom.

Linkerd’s Control Plane

Linkerd’s control plane is also written in Go. It is made up of a controller component, a web component, which serves up the administrative dashboard, and a metrics component, which consists of modified versions of Prometheus and Grafana.

Communications, Sidecar Injection and Debugging: Controller is a container that consists of multiple components. These components include: public-api for external communications, proxy-api for communicating with the Linkerd data plane, proxy-injector for injecting the sidecar proxy to a deployment, tap for debugging single pods and identity brokers certificates to each service for mutual TLS connections.

Administration and Monitoring: Web is the component that enables a basic administrative and monitoring dashboard for your Linkerd deployment. With the dashboard, you can get insight into metrics like requests per second and latency, plus visualize dependencies and investigate the health of service routes.

Metrics Storage: Prometheus is the component that stores all of the metrics exposed by Linkerd so it can be used to generate dashboards.

Metrics Visualization: Grafana is the component that renders metrics dashboards, which can be accessed from within the administrative console.

Service Meshes and Organic Architectures

For development teams building microservices, service meshes enable them to abstract away the complexities that distributing services brings about. Capabilities like encryption, “intelligent” routing and runtime observability are helpful, but quickly prove to be too limited as applications grow and become increasingly connected. While such organic architectures are immensely beneficial to the business, operators quickly find themselves losing control.

Organic architectures come into being when applications and services are composed in a continual and nimble manner. An organic architecture is able to adapt to the numerous and rapidly changing needs in an agile enterprise while growing organically in a federated way allows for rapid adaptation that results in a fast time to market.

Operations teams need control over more than just service-to-service calls. They need to be able to apply operational patterns like backpressure, segmentation or bulkheads to arbitrary sets of interactions. They also need to be able to layer policies so they can be applied without affecting each other. Operations teams need to be able to control their service landscape in real-time, without having to manage hundreds of YAML descriptors. To do all that, they don’t need opinionated platforms, but instead solutions that integrate with existing tools and apply to the entire service landscape, without affecting any deployment.

Glasnostic is a cloud traffic controller that lets operations and security teams control the complex interactions and behaviors among federated microservice applications at scale.

Glasnostic is a control plane for organic architectures that helps operations and security teams control the complex interactions and behaviors among federated microservice applications at scale. This is in contrast to service meshes, which manage the service-to-service connections within an application. Glasnostic is an independent solution, not another platform. It requires no sidecars or agents and integrates cleanly into any existing environment.

By gaining control over service interactions, teams can control emergent behaviors, prevent cascading failures and avert security breaches.

Comparing Service Meshes to Glasnostic

Comparing Istio, Linkerd and Glasnostic control and data planes.

Summary

In this post, we looked at what a service is and what comprises the data and control planes of two popular service meshes, Istio and Linkerd. We also examined how service mesh capabilities fail to address operator concerns. These concerns come into sharp relief when an organization’s microservices deployment evolves into a large, complex and unpredictable organic architecture. Finally, we compared and contrasted the capabilities of Istio, Linkerd and Glasnostic for addressing these concerns.

--

--