Why is Service Mesh ?

Siva
5 min readNov 5, 2018

--

Unless you’ve been living under a rock you should’ve heard about Kubernetes which became a norm for hyper-growth internet companies. Now, there is also a buzz going on about service mesh which is used by this hyper-growth companies to solve a particular problem . So if you are probably wondering what is service mesh ? I will do you one better.

Evolution of Internet Applications

To understand the necessity of it, we shall look at brief history of Internet applications by breaking down into multiple phases.

Phase 0: Monolith

Image courtesy: blog.red-badger.com

Remember those times ? Entire codebase packed as one executable and deployed. This still works better depending on use case.

But the problem is some quickly growing companies had hard time with scalability, deployment, ownership, etc.

Enter Phase 1: Micro Services

Idea is simple, break down monolith into multiple pieces with SLA. This worked so well and widely adopted by many companies.

Now, each team took their liberty to architect their micro services with their favorite language, framework, etc. Then it started to look like this.

We used to joke in one of my projects that there is a micro service for every language out there :)

Though it solved few problems in phase 0 now companies have some serious problems.

  • Provision VM’s with a spec for each micro service.
  • Maintain system level dependencies with automation tools like Chef, OS versions, etc.
  • Monitoring each service.

This is a nightmare for people who owns build and deployment.

And lets say they all share same os but need isolation or they are packaged into separate VM image for portability reasons. This is how typical setup looks like.

Spinning new VM for every service / replication is expensive !

Enter Phase 2: Containerization

By exploiting cgroups and namespaces in linux a new OS level virtualization technique came in allowing isolated environment for applications by sharing same host operating system. Docker is the most popular container runtime.

So an image is created for every micro service and published.Now applications are isolated, spinning up a new container is quick and cheap and all these possible with one operating system!

This is how setup will look like now.

Containerization solves build and deployment problem. We don’t have perfect solution for monitoring yet!

So thats it ? Do we have any other problems ? Managing containers!

There are some critical things to be taken care of to run a reliable infrastructure with containers.

  • Availability of containers
  • Provisioning containers
  • Scaling up/down
  • Load balance
  • Service discovery
  • Scheduling container across multiple machines

Enter Phase 3: Container Orchestration

Kubernetes is the most popular container orchestrator it drastically changed the way how we look at infrastructure.

Kubernetes takes care of health checking, availability, load balancing, service discovery, scalability, scheduling containers across VM’s, etc. Amazing !

So is that it ?

Not really, remember we haven’t solved the monitoring/observability problem from the micro service phase yet. That’s just a tip of the ice berg. Micro services are distributed, managing micro service isn’t so simple.

There are some best practices we need to consider to run micro services conveniently.

  • Metrics(Latency, Success rates, etc.)
  • Distributed Tracing
  • Client side Load Balancing
  • Circuit breaking
  • Traffic shifting
  • Rate limiting
  • Access Logging

Companies like Netflix have came up with several tools and embraced those practices for running micro services.

  • Netflix Spectator (for metrics)
  • Netflix Ribbon (Client side LB / Service Discovery)
  • Netflix Hystrix (Circuit Breaking)
  • Netflix Zuul (edge router)

Right now only way to satisfy those best practices are to use a client library on each micro service for each problem.

So now a service will look like this

Micro service augmented with multiple libraries

But Service A is written in Java what about other services ?

What if I don’t find equivalent libraries for other languages ?

How do I make all the teams to use/maintain/upgrade the library versions ?

My company have hundreds of services should I modify them all to use above libraries ?

You see the problem now ?

This has been a problem ever since the dawn of micro services.

Enter Phase 4: Service Mesh

There are multiple proxies like Envoy, Linkerd and Nginx providing solution for Mesh. But this article will focus only about Envoy Mesh.

Envoy is service proxy designed from ground up by keeping all theses operational problems arise with micro services.

Envoy can run alongside of every application as sidecar and abstract the network. When all service traffic in an infrastructure flows via an Envoy mesh, it becomes easy to visualize problem areas via consistent observability.

See this is how it will look like when after adding envoy as sidecar to every service. Every ingress and egress to a micro service will go through Envoy.

Envoy is baked in with many features that comes handy

  • Supports HTTP 2, including gRPC
  • Health checking
  • Load balancing
  • Metrics
  • Tracing
  • Access logging
  • Circuit breaking
  • Retry policies
  • Timeout config
  • Rate limiting
  • Statsd/Prometheus support
  • Traffic Shifting
  • Dynamic configuration with xDS Server

And many more!

So by abstracting entire network from service and forming a mesh with Envoy as its data plane allows us to control the abilities listed above.

Also checkout my control plane implementation for Envoy.

Do share your feedbacks! Thanks!

--

--