Making legacy apps cloud-native using the service mesh approach
When migrating from a monolithic app to microservices, we face new problems. Today we will discuss one of the most painful problems that microservices bring — observability — and issues arising when trying to solve it.
When dealing with a single monolithic app, usually it’s easy to understand where the problem lies — in your app or in the database. But when you have dozens, hundreds or even thousands microservices, each with a dedicated database, locating a particular service causing the problem is a daunting task.
Distributed tracing is a common solution to this problem. But what if your apps don’t send tracing spans? Or worse, sometimes they do and sometimes they don’t. To identify the root cause, you still need a full picture of the system and an understanding of which service paths your business-critical requests follow.
This is where the service mesh approach comes into play. It helps intercept all network traffic, analyze, and transform it on the fly.
The service mesh approach
Let’s briefly review this approach. The main idea is in injecting a special sidecar container into each microservice in your system and routing all traffic first to the sidecar instead of the microservice. Usually, it is a transparent proxy that proxies all traffic and does some traffic analysis. Also this is the place where we can do client load balancing and apply security policies, rate limiting.
There are several service mesh implementations available. These are Istio and linkerd2. They offer multiple features. Simply check out their websites. But this powerful set of features brings a major overhead to the infrastructure. The larger your clusters are, the larger is the overhead you get when using such systems. At Avito, we have thousands of instances of our microservices, and in the current Istio architecture (which we used as the main service mesh solution) it requires a lot of RAM in each sidecar instance (approximately 350Mb), even after all recommended optimizations. And what was worse, it’s not the only problem for us. It also brings a major latency overhead (up to 10ms on each request).
Eventually, we reviewed the key features we expect from a service mesh solution and found that the main thing we need is transparent distributed tracing of our microservices network interaction.
And this was where we applied a new solution — Netramesh.
Netramesh is a lightweight service mesh for unlimited scalability.
The main goal of the new solution is a small footprint and high performance. Also, we wanted to be able to collect distributed tracing spans to our Jaeger system.
Nowadays, the majority of cloud native technologies is implemented in Golang. And of course, there is a reason for this. It offers a convenient interface for coding asynchronous multithreaded applications. And what’s also very important, its performance is high enough for this problem. That’s why we chose Golang.
This is where we focus our efforts. We wanted a small RAM and CPU footprint for each sidecar. And of course, latency overhead should be also relatively small. This is what we have so far:
Netramesh consumes ~10Mb without traffic and 50Mb at most with a load of up to 10000 RPS to one instance.
Istio envoy proxy always consumes ~350Mb in our clusters with thousands of instances. It’s too much for us and we can’t scale it up for our cluster sizes.
With Netramesh we get ~10x decreasing of RAM usage.
CPU usage is relatively equal under load. It depends on the load and number of requests per second to the sidecar. Performance at a 3000 RPS peak value:
But there is another problem with Istio. Envoy proxy CPU utilization without load is not zero due to control plane interaction:
And sometimes, it went up to cores:
We use HTTP/1 for interaction between microservices and latency overhead with the Istio sidecar injected was 5–10ms. With Netramesh, we have an overhead of 0.5–2ms.
Small footprint gives us a possibility to inject it to each instance of each microservice. But most common service mesh solutions have an additional component — special control plane component providing service discovery, common settings, timeouts to sidecars. Usually, it provides all discovery information to each sidecar in our system (it can be one or more clusters). Eventually, we get fat sidecars and are unable to scale up to big clusters.
In the first version of Netramesh, we decided not to create the separate control plane component to support unlimited scalability. Our solution has no control plane and can be used with any orchestrator — Kubernetes or any other. But in further versions, it can appear as an option to support new features, such as security policies.
Netramesh doesn’t add any additional mechanism for service discovery. It transparently proxies all traffic through itself.
For now, Netramesh supports HTTP/1 application level protocols. And as I said, it has no control plane, so we can’t collect any additional information from our orchestrator or other components. This was the point when designing an architecture for Netramesh. Currently, it defines the application level protocol using port mapping. Usually, you have single port numbers for app level protocols. For example, we have ports 80, 8890, 8080 for HTTP protocol (microservices interaction). You can parametrize it in Netramesh using the environment variable
If you use Kubernetes as an orchestrator and its service discovery mechanism to make requests from one service to another, the mechanism remains the same. First, the microservice resolves the virtual service IP address using kube-dns and then makes a request to it. But all TCP packets go first to the netra sidecar, and only then to the original destination. Pod IP translation also remains the same, NAT to pod IP takes place in the host node.
Distributed tracing and context propagation
It’s not that easy to understand what is happening in your system. And we can solve this problem by using distributed tracing. Netramesh provides the functionality required to send tracing spans from HTTP interactions. It parses HTTP protocol, measures latencies, extracts information from HTTP headers. Ultimately, you can have all system traces in a single Jaeger tracing system. You can configure it using the simple environment variables that the jaeger go library provides.
But there is a problem. Unless your microservices generate and propagate special uber context tracing header, you won’t see connected tracing spans in your system. And Netramesh comes in handy again. It extracts headers from HTTP requests and generates a uber trace id header if it is missing. It also stores the context information to match inbound and outbound microservice requests. All you need to do in your microservices is propagate any request ID header customizable using the environment variable
NETRA_HTTP_REQUEST_ID_HEADER_NAME(defaults to X-Request-Id). To manage the storage, you can use the following environment variables:
NETRA_TRACING_CONTEXT_EXPIRATION_MILLISECONDS (tracing context mapping cache expiration) and
NETRA_TRACING_CONTEXT_CLEANUP_INTERVAL (tracing context cleanup interval).
Also, it is possible to combine several tracing routes by tagging them with a session marker. Netra allows setting
HTTP_HEADER_TAG_MAP to convert HTTP headers into corresponding tracing span tags. It can be useful in testing, as you can send several requests with the same session ID and then query Jaeger to show all the requests generated in the session.
Traffic routing and the inner workings of Netramesh
Netramesh consists of two main components. The first one is netra-init, which sets up network routing rules. It uses iptables redirect rules to intercept all or some of the traffic to netra sidecar, which is the second main component of Netramesh. You can configure which ports you want to intercept for inbound and outbound connections using these variables:
Also, it has an interesting feature — probabilistic routing. If you don’t need to trace all connections (for example, in a production environment), you can use the environment variables
NETRA_OUTBOUND_PROBABILITY (from 0 to 1). The default value is 1 (intercepting all traffic).
After successful interception, netra sidecar accepts the new connection and uses
SO_ORIGINAL_DST socket option to retrieve the original destination. Then it opens a new connection to the original destination and sets up bidirectional TCP streaming between two sides. If the port has been identified as an HTTP port, then it tries to parse it and trace. If it fails, it gracefully falls back to TCP streaming.
After collecting lots of tracing information in the Jaeger system, you want to retrieve the entire system graph. But if you have billions tracing spans daily, it’s not that easy to aggregate them promptly. The standard way to do this is spark-dependencies. But it can take hours and consumes a lot of computing and network resources.
If you use Elasticsearch to store tracing spans, then you can use a simple tool implemented in Golang that can build the entire graph within minutes: jaeger-dependencies.
How to use Netramesh
You can easily inject it into any of your services. Check out an example here.
For now, we don’t have an automatic injector for Kubernetes, but we are planning to implement this.
The future of Netramesh
The focus of Netramesh is achieving a small footprint and high performance while offering features supported by the service mesh approach. It will support application level protocols other than HTTP1 and support L7 balancing (based on HTTP headers).
Probably it will support Kubernetes API to be able to easily collect additional system information (mostly for security policies and balancing). But it will always remain a lightweight and easy to use solution.