Service mesh has become a popular way to deploy, manage and deliver micro-services. The variety of options out there to realize the service mesh shows its popularity. Initiatives like the Service Mesh Interface are an attempt at standardizing the mesh interface. With Google offering istio to define and provide service mesh, a group of other companies including Microsoft have started this initiative to standardize the service mesh.
The visibility and benefits provided by a service mesh are only limited to the micro-services environment. Extending them outside the mesh can provide substantial benefits, especially when it comes to running operations in cloud-native environments.
Here we touch upon some of the sailent features of service mesh, the providers of service mesh on kubernetes, some initiatives around it and conclude with how YAStack is a key component to extending the service mesh.
Zero trust assumes there is no trust between entities. Or, phrased in a different way, if a process (or micro-service) runs in an environment with other processes, they do not trust each-other. In absence of trust, there has to be a way to establish trust among each other.
Every entity in a service mesh needs an identity. And, it needs an ability to verify that identity. Once the identities are established, it needs a way to specify policy around who can talk to who else.
Zero trust environments provide the identity and policy based segmentation that can be effective against a breach. It can be also be a way to enforce compliance. A breach, typically lands using a vulnerability and moves laterally to spread itself. It also tries to communicate with external services (like Command and Control center) to report about the environment. A zero trust environment prevents this by ensuring the compromised entity cannot talk to unauthorized services. This limits your blast radius.
Service meshes provide telemetry to inspect all the aspects of a system. This is critical. Smooth operation of a service mesh needs the administrator to quickly look at the telemetry and identify problems if any.
Imagine an application that invokes several microservices to serve an external facing API. Imagine if there is a problem with one of the microservices. Deep telemetry that provides detailed information about each microservice can help quickly identify the problem. This is a critical aspect of running a service mesh.
Istio: Realization of service mesh by Google/IBM/Lyft
Istio is a popular service mesh and uses the Envoy proxy as a sidecar to realize the critical aspects of the service mesh.
As described above, service mesh also needs an identity provider. You need a way to establish identity, and then enforce it. Citadel in istio, provides identity in form of a X.509 certificate. The enforcement piece (sidecar envoys) use these certificates to ensure communication across pods only happens according to the specified policy.
The sidecar envoys that run alongside services in every pod also report telemetry to help observe these microservices. In addition to regular statistics like latency, throughput etc., these sidecars also facilitate distributed tracing using Jaeger, Zipkin, Opentracing etc.
Linkerd (and Service Mesh Interface by Microsoft and others)
Another popular realization of the service mesh is linkerd. While istio runs Envoy as a sidecar proxy (in every pod), linkerd has its own proxy (written in rust) that runs alongside every node in your kubernetes cluster (Daemonset). It uses an identity provider to get a certificate and can encrypt traffic across nodes to secure communication.
The inline proxy also reports telemetry that can be used to observe micro-services.
The popularity of service mesh can be seen through the efforts towards standardization of the mesh. One such effort at democratizing the service mesh is an initiative by Microsoft — Service Mesh Interface. As of writing this (5/2019), it is in very early phases. It attempts to define kubernetes objects (or API) that standardize functions like access control, traffic characterization, traffic split and metrics. A standard API to control these aspects of the mesh would allow an administrator to have a uniform API to apply policy across a hetrogenous mesh environment.
Extending tracing outside the mesh
Disaster recovery for an application can be achieved by replicating the application across data-centers. A DNS rule can distribute traffic across these replicas. However if any form of L7 routing is desired across these application locations, then fronting it with a proxy is a good idea. For instance, in a lift-n-shift type of scenario, when say (1) a monolithic application in a private data-center is broken up into a cloud-native micro-services on a public cloud (2) traffic distribution is desired across public and private cloud in hybrid use cases.
Now, say for use-case (1), for a specific L7 path, we may build a micro-service and deploy it on the public cloud. One way to test this micro-service would be to replicate traffic for this specific L7 path. Alternatively, instead of replicating traffic, a small percentage of traffic is sent to the new service.
In any of the above use-cases, if YAStack (or vanilla envoy proxy) is run outside the mesh for L7 functions, it can provide end-to-end tracing information. The advantage of this is the ability to visualize the traffic latency end-to-end, right from the time the DNS sends it over to an edge proxy for L7 routing and processing, all the way to where it is served by a micro-service.
In the trace below, the span labelled edge-yastack is the one generated by the edge proxy. It shows the time consumed from the edge proxy till it hits the istio-ingress gateway and then the latency incurred in each of the steps.
YAStack is performant envoy, it provides for a drop-in replacement for vanilla envoy to boost performance. In the above mentioned use-case, YAStack would provide the required performance in addition to extending the mesh.
We propose a way to provide deep end-to-end visibility while ensuring L7 routing for multi-cloud and hybrid-cloud use-cases for applications. As applications get broken up into micro-services, and the complexity to manage them increases, having a high performance proxy like YAStack provides the necessary visibility and mesh extension. They can be run anywhere — in public cloud or a private data center, and do not need over-provisioning like legacy hardware based appliances. The homogenous environment provided by such an architecture simplifies operations.