Service Mesh Performance Evaluation — Istio, Linkerd, Kuma and Consul
Authors: Dahn Youssefi, Florent Martin
Modern applications are often composed of numerous microservices that run in containers distributed on-premises and in the cloud. In this context, a service mesh is an infrastructure layer that tackles the challenges of security, connectivity and observability of these distributed microservices. But what about the impact in terms of latency of having an extra layer of components (from Mesh infrastructure) intercepting all your traffic ?
In fact, in real-world applications, an API call to a microservice can cascade into different service calls, for database access for instance. Increased latency can be critical for the user experience and also for search engine optimization.
In this article, different Service Mesh performances have been measured and evaluated. The figure below illustrates a basic Service Mesh infrastructure, with proxies intercepting traffic and applying a set of configured features such as routing, access control, observability, load-balancing. A control plane provides policies and configurations for all of the running proxies in the mesh and maintain a view of the services. It dynamically update the proxies, as the rules or the environment changes.
Basic Service Mesh configurations have been compared while ensuring that the configurations are equivalent for the measures to make sense. A baseline without Service Mesh has also been used.
Then, additional Service Mesh features have been added and the impact of the features on the performances has also been measured. Different Service Mesh have been evaluated:
Code repository: https://github.com/ELCAIT/service_mesh_performance
Tested Service
For evaluation purposes, we developed a simple Java REST API microservice using Spring Boot: the counter-api.
The counter-api microservice is a simple counter: the first response is 0, and this number is incremented at each new request. The response is formatted in JSON.
Request generator
For load testing, we used Locust to generate a large amount of requests.
Locust is an application written in Python able to generate requests at a fixed rate lambda; this represents the number of Requests sent Per Second (RPS). A Docker image is available in order to deploy and configure a Locust service inside the Service Mesh Kubernetes cluster.
Locust can be configured to send HTTP requests at a fixed rate to a service. An advantage of Locust is that multiple workers can be deployed in the cluster and then send the measurements to the master instance as shown in the figure below. This allows having more throughput and elasticity in the generation of the requests.
Locust master instance collects the metrics from the workers. Locust workers generate requests to the targeted service and will measure the latency.
Open-loop / Close-loop testing
A load-generator is open-loop when outgoing requests follow a Poisson distribution. Requests are sent independently of the previous requests completions. With close-loop request generator, new requests are only triggered after previous requests completion \cite{loop}.
Each Locust worker instance can represent an independent user. A Locust worker instance is not open-loop: the rate lambda is bounded by the maximum response rate of the tested service. No request is sent until the previous request has been answered.
Only a few independent locust instances are used for the testings, hence the overall system is Closed-loop.
Metrics
To compare the different Mesh implementations, we defined different metrics.
The round-trip latency
The delay between the Service call and the reception of the response in the application.
Latency unit: milliseconds
The RAM and CPU consumption
The tested service RAM and CPU consumption is measured. The sidecar and application are observed in order to visualize the resources used.
RAM unit: mebibytes [Mi].
CPU unit: milli CPU [mCPU].
As shown in the Figure below, the traffic is captured four times by the sidecar proxies during a complete HTTP request:
Service Mesh comparison
Several differences exist between the different Service Mesh, and the different Service Mesh default configurations are not comparable. A “close to equivalent” configuration must be used in order to compare the solutions fairly:
- Retries and timeout have different default configurations in the Service Mesh. In order to be fair with a baseline without Service Mesh, those features have been disabled.
- Sidecar requested CPU and RAM are different among the Service Mesh. Usually, the Service Mesh sidecar proxy CPU request is between 50 and 150 milli CPU. However, the CPU limit (if set) can differ from 325 milli CPU (Linkerd) to 2000 milli CPU (Istio) or no limit at all (Consul). For fairness, the CPU limit has been set to 200 milli CPU (configuration available here, here and here).
During the benchmark, this limit has not been reached. - The Security and Observability features are by default configured at a different level and granularity between the Service Mesh. In order to compare fairly the ground difference, mTLS and observability features
are disabled.
Service Mesh Feature Impact
Finally, different features have been added in order to measure the impact of the features:
- Only mTLS enabled for all service-to-service communications
- mTLS + default observability features are enabled:
Default observability and metrics configurations are added. Those default configurations are usually not production-ready and are presented in “Getting Started” guides. Due to time constraints, the default configurations have not been changed in order to balance the different Service Mesh configurations. Hence, only the observability feature impact can get measured. It won’t be relevant to compare different Service Mesh between them with the Observability feature enabled. - ISTIO AND KUMA ONLY: JWT authentification feature has been enabled. This has been done in two different ways: using Istio JWT Authentication and using a custom envoy filter configured in Istio manually and in Kuma with a ProxyTemplate.
- ISTIO ONLY: Open Policy Agent (OPA) has been added to Istio as an External Authorization Provider and is also configured to perform JWT authentication. This performance evaluation is valuable
since the sidecar proxy has to complete an additional request in the local
network to the OPA Policy Decision Point to evaluate each request. This extra hop is expected to add more latency to the requests.
Testing environment
For the testings, an AWS Elastic Kubernetes Service(EKS) cluster has been used.
- Number of nodes: 9
- Node details: Amazon t3.medium EC2 instance is used for each node
- The tested microservice counter-api can deploy only in nodes 1 and 2.
- The request generator Locust is only deployed in nodes 3 to 9. This ensures that Locust and the tested microservice do not compete with each other in resource consumption.
Locust has higher resources (7 nodes) in order to ensure that the saturation in the measurement is not bottlenecked by Locust. The tested service is so light and simple that multiple Locust instances are required to stress the tested service.
Methodology
All requests are sent to the counter-api microservice.
- Number of counter-api pods: 4 instances
- CPU requested per pod: 150 milli CPU. CPU limit: 325 milli CPU.
- Memory (RAM) requested per pod: 128Mi. Memory limit: 256M
Step 1. Compute the saturation point
Computing the counter-api saturation point is required to understand what testing with “500 RPS” means. Is it close to the saturation of the service? Or is this only a low charge for the service?
The model is closed-loop, and the request generation is limited by a maximum Requests Per Second (RPS). To measure the saturation point of the tested service, a first test is performed with all the Service Mesh.
The request generator, Locust, first starts to send requests at a low rate (i.e. RPS = 100). Then, this number increases over time. Close to the saturation point, the number of RPS will not be able to increase further. This metric can even reduce since the system can saturate and won’t be able to answer for a while. The RPS at time t [second] is measured by counting the number of responses received at time t [second].
A graphic is generated by taking the RPS received over time, to observe the saturation point. At first, the RPS should grow linearly and then close to the saturation point, the RPS should stop increasing and even oscillate.
This has been done for 70 seconds for each Service Mesh and the baseline.
Note. A first round of testing has been performed with 4 counter-api instances and 2 locust-workers instances (only two nodes were used). The saturation point was measured close to 250 RPS. However, the median latencies measured at ~25 RPS and ~250 RPS were not significantly different. This has shown that the bottleneck of the testings was probably the Locust generator. The testing generator was saturating before the counter-api service! I have personally configured Locust to collect and log each request latency. As this choice is questionable in terms of performance, I wanted to have all the raw datas in order to generate relevant graphics and metrics. Furthermore, this was easy to overcome with using more Locust workers instances in parallel. A new round of testing been performed, with more locust-workers instances (20+).
To ensure that the RPS in the system is not bottlenecked by the request generator Locust, many Locust worker instances (pods) have been used.
The objective is to increase the number of Locust worker instances until the system saturation RPS does not increase anymore. This ensures that the RPS bottleneck is not Locust.
Step 2. Testings
The different solutions have been tested for 70 seconds, each time with a different RPS which increases close to the saturation point.
The latency response of each request is measured and collected. However, latencies of the first 5 seconds are dropped, to let the Service Mesh initialize and stabilize. Indeed, first requests have been observed to be way longer than the others, probable because of some components’ initialization performed when first requests are received.
Finally, all benchmark tools, Service Mesh and applications are removed and reinstalled in the nodes between each testing.
Results
1. Saturation point
As observed in the figure below, the RPS increases “linearly” until ~1500 RPS per second when the curve starts to oscillate. A saturation point close to 1800 RPS can be observed. With 20 Locust instances, all RPS are limited under 1750 RPS. With 25 instances, RPS can even reach 2000 RPS. This is also the case with 30 Locust instances. It is observed that the RPS does not go higher, even if the Locust instances are increased from 25 to 30. Hence, with 30 Locust instances, the bottleneck is not Locust.
2. Service Mesh comparison (no mTLS, no observability)
The counter-api service has been tested under different RPS: 100, 200, 400, 700, 1000, 1200, 1500, 1800. The expected RPS was not always reached, particularly 1800 RPS which is around the saturation point of certain Service Mesh.
A highlight of the results is shown in the table below. Complete measurements are available in the GitHub repository. All the values are given in [ms].
Observations:
- The expected RPS is not always reached. This could be explained by saturation in the tested service, which will respond at a lower rate for a while and reduce the average RPS of the testing.
- On low load (~ 200 RPS), Service Mesh solutions add between 0.7[ms] and 1.5[ms] to the median latency of the baseline solution (without Service Mesh). Mean latency is not significantly impacted. CDF and CCDF graphs are shown in the Figure below. Complementary Cumulative Distribution Function (CCDF) is particularly useful to observe the tail of the distribution: the 90, 99, 99.9 and 99.99 percentiles…
- With a medium load (~ 700 RPS), 5[ms] to 25[ms] are added to the baseline median latency. CDF and CCDF graphs are shown in the figure below.
- With a high load (~ 1200 RPS), 10[ms] to 50[ms] are added to the baseline median latency. CDF and CCDF graphs are shown in the figure below.
- The sidecar RAM and CPU consumption increase with the number of RPS as shown in the GitHub repository annex results. However, the sidecar resources consumption remains low and the resources used are way lower than the counter-api service consumption. Istio and Linkerd are the two Service Mesh that uses the most CPU. This might explain why the performances are better close to the saturation point with those two Service Mesh.
- Mean latency is sometimes two times larger than the median. This is explained by the number of long latency requests that increases with the number of RPS, when the tested service saturates.
- Even when the baseline performs better in terms of median latency, the mean latency is sometimes better with the Service Mesh. This could be explained by more efficiency with the Service Mesh, and probably better load balancing that avoids long latency requests. Indeed, Service Mesh load balancer can have the ability to select a service instance according to the least number of active requests — which reduces the risk of unnecessarily saturating a Service when the number of RPS is low. Without Service Mesh, the Kubernetes load balancer performs the Round Robin algorithm by default which might be less efficient in the benchmark.
3. Service Mesh comparison (with mTLS, no observability)
Similar results and latencies differences are observed with mTLS enabled. This shows that the mTLS feature does not have a significant impact and that the observations are repeatable with a different run of the test. See the GitHub repository for the full report.
4. Service Mesh Feature Impact
The four Service Mesh have been tested with different configurations in order to evaluate the performance impact of some features. The results with Istio are shown in the table below. Results with the other Service Mesh are similar and available in the GitHub repository. All the values are given in [ms].
It is clearly observed in the figures that adding features does not have a significant impact on the performances: no clear pattern is observed; observations might me imprecision noise in the measurements. Only JWT token verification with Open Policy Agent (OPA) has a noteworthy impact on the performances.
Conclusion
When the service is not under stress (RPS < 500), Linkerd is often measured as the fastest Service Mesh. This can be seen in the median latencies as well as in the 99.9th percentile. This result was expected since Linkerd is already known as a light and fast Service Mesh in the market. Regarding Istio which is known for not being light, it is surprising to observe very good latency results, particularly with high RPS. The drawback with Istio is the CPU consumption which on average consumes two to three times more resources than the others.
Regarding Consul, results are similar to the others until RPS > 200, thenceforth the latencies measurements always fall behind the others.
Overall, the most interesting results are with low to medium load on the tested service. Indeed, since Kubernetes cloud clusters offer elasticity and scalability, an increase in the load should not get the service as saturated as in the benchmarks. It is observed that the Service Mesh adds from 0.5[ms] to 2[ms] to the median latency. However, the mean latency impact is less significant; with RPS < 200, the mean latency is sometimes even better with a Service Mesh. This result was not expected, but it could be explained by the efficient traffic management of the Service Mesh — particularly the ability to select a service instance according to the least number of active requests — which reduces the risk of unnecessarily saturating a Service when the number of RPS is low.
With different features activated, no pattern of latency impact has been observed. With this observation, a Service Mesh operator can freely add more features without expecting a huge impact on latencies. Only Open Policy Agent enabled with Istio has a significant impact on the latencies. This result was expected since the OPA Policy Decision Point adds an extra hop in the local network of the pod.
Finally, the resources used by the Service Mesh sidecars are acceptable. Regarding the features brought by the Service Mesh, the CPU/RAM consumption is not significant. However, one might prefer not to choose Istio if the CPU resources are restricted.
Further work
Measurements and methodology can be improved for further testing. First, it could be interesting to use a different load generator than Locust — preferably open-loop and coded in C. Indeed, the code might not be as efficient in Python, and some noise can get added to the measurements. Moreover, more iterations of the testings could be performed in order — for instance — to measure the variations of the median for the exact same testing. The tendency should remain the same as in the actual testings. However, the exact measured metrics should be more accurate and scientific, with the noise (if any) precisely measured and the statistically significant difference between the Service Mesh identified.