Benchmarking Istio & Linkerd CPU

Background

Here at Shopify, we’re working on deploying Istio as our service mesh. We’re doing quite well, but are hitting a wall: Cost.

Installing the Service Meshes

First thing, I installed SuperGloo in the cluster:

Set up Istio Auto Injection

To get Istio to install the Envoy sidecar, we use the sidecar injector, which is a MutatingAdmissionWebhook. It’s out of the scope of this article, but in a nutshell, a controller watches all new pod admissions and dynamically adds the sidecar and the initContainer which does the iptables magic.

Set up Linkerd Auto Injection

To set up Linkerd sidecar injection, we use annotations (which I added manually with kubectl edit):

metadata:
annotations:
linkerd.io/inject: enabled

The Istio Resiliency Simulator (IRS)

We developed the Istio Resiliency Simulator to try out some traffic scenarios that are unique to Shopify. Specifically, we wanted something that we could use to create an arbitrary topology to represent a specific portion of our service graph that was dynamically configurable to simulate specific workloads.

  • Start 10 clients, sending 100 RPS each to bar
  • Every 10 seconds, take down 1 server, monitoring 5xx levels at the client

IRS for Service Mesh Benchmarking

For this purpose, we set up some IRS workers as follows:

  • irs-client: 3 replicas that receive a request, waits 100ms and forwards the request to irs-server
  • irs-server: 3 replicas that return 200/OK after 100ms

The Results

Control Planes

First, we looked at the control plane CPU usage.

Linkerd control plane: ~22 mcores
Istio control plane: ~750 mcores

Sidecar Proxies

Next, we looked at the sidecar proxy usage. This should scale linearly with your request rate, but there is some overhead for each sidecar which will affect the shape of the curve.

Linkerd: ~100 mcore for irs-client, ~50 mcore for irs-client-loadgen
Istio/Envoy: ~155 mcore for irs-client, ~75 mcore for irs-client-loadgen
Linkerd: ~50 mcores for irs-server
Istio/Envoy: ~80 mcores for irs-server

Conclusion

Istio’s Envoy proxy uses more than 50% more CPU than Linkerd’s, for this synthetic workload. Linkerd’s control plane uses a tiny fraction of Istio’s, especially when considering the “core” components.

Senior Production Engineer @ Shopify