Efficiently Finding & Fixing Issues on Kubernetes using Linkerd 2.0 Sidecar
In this tutorial, we’ll walk you through how to get Linkerd 2.0 up and running on your Kubernetes cluster in just a few easy steps. We will also show how to use the Linkerd service sidecar to easily identify issues using the Linkerd Dashboard.
What is Linkerd & why/when should I use it?
Linkerd is a service sidecar and service mesh manager for Kubernetes clusters that provides a layer of telemetry, security, and control across multiple/ all services in the cluster.
Using Linkerd as a service it changes the way we use Kubernetes, it provides a simple config-less dashboard and an easy use UNIX-style CLI tools for runtime debugging, diagnostics, and reliability.
It should be used when you want to get more insights into how your application is running and get further insights into how your application is running on your cluster. This will make troubleshooting and issue mitigation much easier.
How does it work?
Linkerd works by installing ultralight proxies into each pod of, a service in your clusters. These proxies report telemetry data to a control plan.
This means is that getting started with using Linkerd doesn’t require any code changes, and even better, it can even be installed live on a running service.
Let’s get started with an example
For this, we will use the lovely sample as provided by the good people of Linkerd & Buoyant.
Step 1: Install the demo app
Before we install Linkerd, we will start by installing the demo application as provided by Linkerd. It is a simple gRPC demo app called Emojivoto
So let’s install Emojivoto on our Cluster:
curl https://run.linkerd.io/emojivoto.yml | kubectl apply -f -
With this, we will download our manifest for the Emojivoto sample app, and then use kubectl apply’ to apply this manifest on our cluster. For this, I am using a free OKE cluster I setup on OCI.
We will now check that our deployments have been correctly deployed
kubectl get -n emojivoto deployments
You can also see the app live by running on our cluster
$ kubectl get svc web-svc -n emojivoto
This will return the cluster and External IP of the “web-svc” showing that your app is running live.
Using the External Ip we can now test the APP running live on our cluster.
The first impression of this sample app… Wow, much colour… Many emoji!
After playing around with the app for a few seconds you will notice that parts of it aren’t working, that's great for testing some features of Linkerd such as telemetry.
This is good as using the Kubernetes dashboard you cannot see much wrong as it looks like that application is running perfectly. This is as Kubernetes just looks to see if the pods are functioning correctly and not the application responses.
Now lest setup Linkerd so that we can look into the issues further.
Step 2: Install Linkerd’s CLI
Now let's install Linkerd’s CLI on our machines. You can check the Linkerd releases page for release versions etc or easier still, use the command below to download the most recent stable version:
curl -sL https://run.linkerd.io/install | sh
Once the installation is finished, add the
linkerd command to your path :
You should now be able to run the command
linkerd version, which should display:
Client version: stable-2.5.0
Server version: unavailable
“Server version: unavailable” means that we don’t have the Linkerd control plane on our cluster and it needs to be added.
Check your cluster is ready:
Before we add Linkerd to our cluster, we must check that your cluster is ready for Linkerd by running:
linkerd check --pre
This nice command will check and report any problems that will interfere with your ability to install Linkerd.
Step 3: Install Linkerd’s control plane onto the cluster
Next, we will install the Linkerd control plane into its own namespace on your cluster, we will give its own namespace “linkerd”. To do this, run:
linkerd install | kubectl apply -f -
This will generate a Kubernetes manifest and use
kubectl to apply it to your Kubernetes cluster.
Once the manifests for the Linkerd control plane have been applied to your cluser we can check if they have correctly been applied using:
If your cluster is still applying the control plane the command will wait till it finishes.
Test the Dashboard
All going well and you receive a message to say your status check has passed correctly (like below). You will now have Linkerd running on your cluster.
Now that Linkerd is running on our cluster can access the Linkerd dashboard:
If you see something like below open in your browser. Congratulations Linkerd is now correctly running on our cluster.
Step 4: Add Linkerd to the web service
Now that we have Linkerd’s control plane is running on our cluster in the Linkerd namespace, and we have our emojivoto demo app installed in the “emojivoto” namespace. We will now need to add Linkerd to the service we deployed earlier.
There are a couple of different ways to add Linkerd to our service. For the purpose of this example we will use a simple method:
kubectl get -n emojivoto deploy/web -o yaml | linkerd inject — | kubectl apply -f -
This command uses the manifest of the “web” service that it stakes form Kubernetes, then runs this manifest with,
linkerd inject and then reapplies it to the Kubernetes cluster.
linkerd inject takes the manifest and edits it to include Linkerd’s data plane proxies.
Smart Roll Out?
Since the manifest “web” is a Deployment, Kubernetes is smart enough to roll the service one pod at a time. This is great as it means that the deployment “web” can be serving traffic live while we add Linkerd to it. Meaning that adding Linkerd to deployments is seamless.
We now have the Linkerd service sidecar running on the “web” service deployment.
Step 5: Using Linkerd to debug the application issues.
You now have a full demo application running on your cluster with Linkerd installed on the “web” service deployment.
We can now use Linkerd to solve some of the application issues we stumbled across earlier such as some of the voting not working etc. This is useful because as we said earlier the Kubernetes dashboard was showing the application running ok.
Let’s check the Linkerd dashboard:
You should see all the services in the Emojivoto namespace show up. As we have only installed the Linkerd sidecar on the “web” service, this will be the only deployment and pod that displays Success Rate, Request per second and Latency. You will also see a handy button to brings you to a Grafana dashboard for your Linkerd Sidecar.
you should see all the services in the “emojivoto” namespace show up. Since “web” has the Linkerd service sidecar installed on it, you’ll also see success rate, requests per second, and latency percentiles show up.
Why is the success rate inconsistent?
The first thing we will notice, how awesome is this?! Secondly we will notice that our success rate is consistently below 100%.
What is causing this?
Let’s take a look at why and try to resolve it. To do this we will simply click on “web”. This will bring you the deployments dashboard for “web”.
You will notice that there is mock traffic being generated by a “vote-bot” this is a simple part of the application that is included in the Emojivoto manifest. It generates low levels of life traffic with two outgoing dependencies, emoji and voting.
Let's dive a little deeper
As we can see straight away the Emoji service is working 100%, and it’s the voting service is failing a bit.
A failure in a dependent service may be exactly what’s causing the errors that web is returning which we saw earlier.
Let’s looks further into this, as we scroll through the deployment page for web, we can see the live list of all traffic endpoints that “web” is receiving.
Understanding the error rate:
Straight away we can see that there are two calls that are not at 100%: the first is vote-bot’s call the “/api/vote” endpoint. The second is the, “/emojivoto.v1.VotingService/VoteDoughnut” call from the web service to the voting service.
This is interesting as “/api/vote” is an incoming call, and “/VoteDoughnut” is an outgoing call.
This is a good indicator that the “/VoteDoughnut” is the source of our issues.
“Tap”- ping into the issue
Let’s dive a little deeper to see if, in fact, it is the cause of our issues. To do this we will click on the “tap” icon (last icon on the right column).
This will take us to the list of live requests that match this endpoint. This allows us to confirm that the requests are failing as we are getting that the gRPC stats Unknown.
Given that we have now identified the error is coming from the voting system, we can pinpoint the error in the system so that we can fix the voting system code.
As I mentioned already Linkerd comes pre-configured with Grafana to make it really easy to start using Grafana dashboards for all the metrics that Linkerd makes available, this is very useful for ops to see u to date time series based information on services and clusters.
Firstly I want to thank the people over at Linkerd and Buoyant for making an awesome product and a really nice sample app! Great work!
What can I do now:
Now that you know how to install Linkerd and the Linkerd Service sidecar, you can now install this on your deployments and trouble shout or mitigate any future issues you may have by using Grafana for Ops and tap to look at live call issues for errors.