To mesh or not to mesh
Service Mesh
I was told that a Service Mesh such as Linkerd, Consul or Istio, adds a lot of overload in my cluster. Keeping this in mind, a Service Mesh is not suitable to a small deployment. Instead, you should consider a Service Mesh when you client is big enough to deserve it.
But, how big a client must be to deserve a Service Mesh?
And more important, how much overload a Service Mesh adds to my cluster?
The answer is: I don’t know.
Because of this, I’m starting this POC, to answer this question.
Resources
Files here: https://gitlab.com/post-repos/to-mesh-or-not-to-mesh
Requirements
To run this test you will need:
- a k8s cluster (we will use GCP)
- kubectl
- locust
- docker (or any container engine)
- git
We will use Linkerd, so you will need to download the CLI.
BFF
A simple Python APP that exposes a simple API and hits BACKEND. Once it get the BACKEND‘s response, it enrich this response and sends it to the client.
BACKEND
The BACKEND just answers the request with its version number.
Service Mesh
I will use linker for this test.
What we will measure?
We will measure two items:
- WEB response times with Locust
- K8s resources usage
Then compare these metrics on two scenarios:
- using the Service Mesh
- using the raw k8s cluster.
Set up the environment
Cluster
This terraform template can be used: https://gitlab.com/templates14/terraform-templates/-/tree/master/gke
Then login into your cluster, e.g. for this case:
gcloud container clusters get-credentials kungfoo-test --region us-central1 --project speedy-league-274700
The tests
We’ll have two tests: one with and one without the service mesh.
No Service Mesh
Set env
Create the namespace to deploy the app into:
kubectl create ns kungfootest
Set the app and run tests
Go to Set up the app and the to Run the tests. After this come back here.
Clean up
Delete deployments:
kubectl delete -n kungfootest -f deploy-backend.yaml -f deploy-bff.yaml -f ingress.yaml
Delete the namespace so we’re clean:
kubectl delete ns kungfootest
Service Mesh
Linkerd
First, cli must be installed in your system. (more here )
Download binary from here and add it to your PATH.
Since we will be using GKE, we need to run these extra steps: https://linkerd.io/2/reference/cluster-configuration/#private-clusters
Check cluster is ready for linkerd:
linkerd check --pre
I got:
pre-kubernetes-capability-------------------------!! has NET_ADMIN capabilityfound 1 PodSecurityPolicies, but none provide NET_ADMIN, proxy injection will fail if the PSP admission controller is runningsee https://linkerd.io/checks/#pre-k8s-cluster-net-admin for hints!! has NET_RAW capabilityfound 1 PodSecurityPolicies, but none provide NET_RAW, proxy injection will fail if the PSP admission controller is runningsee https://linkerd.io/checks/#pre-k8s-cluster-net-raw for hints
The cluster lacks these capabilities. But probably when Linkerd is installed these will be installed as well. (https://github.com/linkerd/linkerd2/issues/3494)
Then install it:
linkerd install | kubectl apply -f -
…and wait until it’s installed:
linkerd check
Set env
Create the namespace to deploy the app into, this time we’ll need an annotation for linkerd:
kubectl create ns kungfootestkubectl edit ns kungfootest
and then add the annotation:
annotations: linkerd.io/inject: enabled
This will allow Linkerd to automagically inject the proxy in namespace’s pods.
Set the app and run tests
Now, go to Set up the app and the to Run the tests. After this come back here.
Note this time the pods will have two containers, since Linkerd is injecting the proxy.
Compare the test resuts
For my tests:
No Mesh
Total average requests: 33% CPU, 8% memory.
Total average usage: 12% CPU, 26% memory.
Avg response time: 204ms
Mesh
Total average requests: 35% CPU, 9% memory.
Total average usage: 25% CPU, 38% memory.
Avg response time: 206ms
Conclusion
The mesh configuration we used is very basic, but it adds interesting services to our deploy with no need to modify code. (e.g. secure internal connections, metrics…)
From the client’s point of view the time was only 1% more in the meshed version.
From the server side, we’re using 100% more of CPU and 46% of memory.
Does it worth?
As usual, it will depend. Can you afford the CPU and memory usage increase? Then you can have all the service mesh pros at almost no cost on the client side. Anyway, it deserves more tests if you are thinking on it.
But let me read your opinions on this, drop here a message.
Set up the app
Under source
directory there are two subdirs. One for the BFF and one for the BACKEND (inside the later you will have two more dirs, versions 1 and 2… a.k.a. stable and canary, for now we will use just the stable version).
Build the app
Backend
On both cases you must proceed the same way, varying only the version number.
Into source/backend
directory you will see the Dockerfile and the two version directories.
CD into your source/backend
directory and run:
cd source/backend/1.0 && \GOOS=linux GOARCH=amd64 go build -tags netgo -o app && \docker build -t backendapp:1.0 . && \cd ..
…and:
cd source/backend/2.0 && \GOOS=linux GOARCH=amd64 go build -tags netgo -o app && \docker build -t backendapp:2.0 . && \cd ..
Bff
Cd into sources/bff
directory and run:
docker build -t bffapp .
Push them all
Ok, now you have the images… push them all to a reposiroty of your choice and keep their names so we can set them into the k8s manifiests.
Or use these already built images:
docker.io/juanmatias/canary-app:1.0
docker.io/juanmatias/canary-app:2.0
docker.io/juanmatias/canary-app:bff-1.0
Deploy the app
We will deploy all the elements into kungfootest
namespace.
CD into the root project directory and then:
cd manifests
Deploy the backend apps:
kubectl apply -f deploy-backend.yaml -n kungfootest
Deploy the bff:
kubectl apply -f deploy-bff.yaml -n kungfootest
Deploy the ingress:
kubectl apply -f ingress.yaml -n kungfootest
Test the app
Get the public IP:
kubectl get ing -n kungfootest
You can test your app with this command:
curl http://$PUBLICIP/kungfutest/mytest
You should have an output like this one:
{"id": "mytest", "response": "Congratulations! Version 1.0 of your application is running on Kubernetes."}
Run the tests
We’ll run two tests, locust to know response times, and resources to know the used resources.
Locust
From the project root dir:
cd locust
If the first time, create a virtual env and install locust:
pip install locust
Now, run the locust server:
locust -f kungfootest.py
This will open Locust server listening on localhost:8089… open it with your browser.
There, you must add the host (e.g. http://$PUBLICIP), the max number of users and the users spawn rate. Then begin your tests.
I’ll test it with 100 users and a rate of 10 and let the test run for 2 minutes.
Resources
While locust test is running run the script resources.sh. When finish, just type CTRL+C and it will show the AVG mem and cpu.
NOTE: It’s important to keep in mind that this script will get the resources requested for nodes, and the real use only under kungfootest namespace.
References
Monitoring Kubernetes cluster utilization and capacity (the poor man’s way) | Jeff Geerling
First version of this post was published in my blog here.