Efficiently Finding & Fixing Issues on Kubernetes using Linkerd 2.0 Sidecar

Brian Mathews
Sep 12, 2019 · 8 min read

In this tutorial, we’ll walk you through how to get Linkerd 2.0 up and running on your Kubernetes cluster in just a few easy steps. We will also show how to use the Linkerd service sidecar to easily identify issues using the Linkerd Dashboard.

Image for post
Image for post

What is Linkerd & why/when should I use it?

Linkerd is a service sidecar and service mesh manager for Kubernetes clusters that provides a layer of telemetry, security, and control across multiple/ all services in the cluster.

Using Linkerd as a service it changes the way we use Kubernetes, it provides a simple config-less dashboard and an easy use UNIX-style CLI tools for runtime debugging, diagnostics, and reliability.

It should be used when you want to get more insights into how your application is running and get further insights into how your application is running on your cluster. This will make troubleshooting and issue mitigation much easier.

Linkerd works by installing ultralight proxies into each pod of, a service in your clusters. These proxies report telemetry data to a control plan.

This means is that getting started with using Linkerd doesn’t require any code changes, and even better, it can even be installed live on a running service.

For this, we will use the lovely sample as provided by the good people of Linkerd & Buoyant.

Step 1: Install the demo app

Before we install Linkerd, we will start by installing the demo application as provided by Linkerd. It is a simple gRPC demo app called Emojivoto

So let’s install Emojivoto on our Cluster:

curl https://run.linkerd.io/emojivoto.yml | kubectl apply -f -

With this, we will download our manifest for the Emojivoto sample app, and then use kubectl apply’ to apply this manifest on our cluster. For this, I am using a free OKE cluster I setup on OCI.

We will now check that our deployments have been correctly deployed

kubectl get -n emojivoto deployments

Image for post
Image for post
Sample app deployment

You can also see the app live by running on our cluster

$ kubectl get svc web-svc -n emojivoto

This will return the cluster and External IP of the “web-svc” showing that your app is running live.

Image for post
Image for post

Using the External Ip we can now test the APP running live on our cluster.

Image for post
Image for post
Linkerd provided sample app — credit Linkerd

The first impression of this sample app… Wow, much colour… Many emoji!

After playing around with the app for a few seconds you will notice that parts of it aren’t working, that's great for testing some features of Linkerd such as telemetry.

This is good as using the Kubernetes dashboard you cannot see much wrong as it looks like that application is running perfectly. This is as Kubernetes just looks to see if the pods are functioning correctly and not the application responses.

Now lest setup Linkerd so that we can look into the issues further.

Step 2: Install Linkerd’s CLI

Now let's install Linkerd’s CLI on our machines. You can check the Linkerd releases page for release versions etc or easier still, use the command below to download the most recent stable version:

curl -sL https://run.linkerd.io/install | sh

Once the installation is finished, add the linkerd command to your path :

export PATH=$PATH:$HOME/.linkerd2/bin

You should now be able to run the command linkerd version, which should display:

Client version: stable-2.5.0
Server version: unavailable

“Server version: unavailable” means that we don’t have the Linkerd control plane on our cluster and it needs to be added.

Before we add Linkerd to our cluster, we must check that your cluster is ready for Linkerd by running:

linkerd check --pre

This nice command will check and report any problems that will interfere with your ability to install Linkerd.

Image for post
Image for post
Pre-check passed

Step 3: Install Linkerd’s control plane onto the cluster

Next, we will install the Linkerd control plane into its own namespace on your cluster, we will give its own namespace “linkerd”. To do this, run:

linkerd install | kubectl apply -f -

This will generate a Kubernetes manifest and use kubectl to apply it to your Kubernetes cluster.

Once the manifests for the Linkerd control plane have been applied to your cluser we can check if they have correctly been applied using:

linkerd check

If your cluster is still applying the control plane the command will wait till it finishes.

All going well and you receive a message to say your status check has passed correctly (like below). You will now have Linkerd running on your cluster.

Now that Linkerd is running on our cluster can access the Linkerd dashboard:

linkerd dashboard

If you see something like below open in your browser. Congratulations Linkerd is now correctly running on our cluster.

Image for post
Image for post
Linkerd dashboard

Step 4: Add Linkerd to the web service

Now that we have Linkerd’s control plane is running on our cluster in the Linkerd namespace, and we have our emojivoto demo app installed in the “emojivoto” namespace. We will now need to add Linkerd to the service we deployed earlier.

There are a couple of different ways to add Linkerd to our service. For the purpose of this example we will use a simple method:

kubectl get -n emojivoto deploy/web -o yaml | linkerd inject — | kubectl apply -f -

This command uses the manifest of the “web” service that it stakes form Kubernetes, then runs this manifest with, linkerd inject and then reapplies it to the Kubernetes cluster.

linkerd inject takes the manifest and edits it to include Linkerd’s data plane proxies.

Since the manifest “web” is a Deployment, Kubernetes is smart enough to roll the service one pod at a time. This is great as it means that the deployment “web” can be serving traffic live while we add Linkerd to it. Meaning that adding Linkerd to deployments is seamless.

Image for post
Image for post

We now have the Linkerd service sidecar running on the “web” service deployment.

Step 5: Using Linkerd to debug the application issues.

You now have a full demo application running on your cluster with Linkerd installed on the “web” service deployment.

We can now use Linkerd to solve some of the application issues we stumbled across earlier such as some of the voting not working etc. This is useful because as we said earlier the Kubernetes dashboard was showing the application running ok.

Let’s check the Linkerd dashboard:

Image for post
Image for post
Linkerd dashboard with sidecar deployment

You should see all the services in the Emojivoto namespace show up. As we have only installed the Linkerd sidecar on the “web” service, this will be the only deployment and pod that displays Success Rate, Request per second and Latency. You will also see a handy button to brings you to a Grafana dashboard for your Linkerd Sidecar.

you should see all the services in the “emojivoto” namespace show up. Since “web” has the Linkerd service sidecar installed on it, you’ll also see success rate, requests per second, and latency percentiles show up.

The first thing we will notice, how awesome is this?! Secondly we will notice that our success rate is consistently below 100%.

What is causing this?

Let’s take a look at why and try to resolve it. To do this we will simply click on “web”. This will bring you the deployments dashboard for “web”.

Image for post
Image for post
“Web” service deployment dashboard

You will notice that there is mock traffic being generated by a “vote-bot” this is a simple part of the application that is included in the Emojivoto manifest. It generates low levels of life traffic with two outgoing dependencies, emoji and voting.

Image for post
Image for post
Vote bot and deployment dependancies

As we can see straight away the Emoji service is working 100%, and it’s the voting service is failing a bit.

A failure in a dependent service may be exactly what’s causing the errors that web is returning which we saw earlier.

Let’s looks further into this, as we scroll through the deployment page for web, we can see the live list of all traffic endpoints that “web” is receiving.

Image for post
Image for post
Live calls for web deployment

Straight away we can see that there are two calls that are not at 100%: the first is vote-bot’s call the “/api/vote” endpoint. The second is the, “/emojivoto.v1.VotingService/VoteDoughnut” call from the web service to the voting service.

This is interesting as “/api/vote” is an incoming call, and “/VoteDoughnut” is an outgoing call.

This is a good indicator that the “/VoteDoughnut” is the source of our issues.

Let’s dive a little deeper to see if, in fact, it is the cause of our issues. To do this we will click on the “tap” icon (last icon on the right column).

This will take us to the list of live requests that match this endpoint. This allows us to confirm that the requests are failing as we are getting that the gRPC stats Unknown.

Image for post
Image for post
/VoteDoughnut live call request details

Given that we have now identified the error is coming from the voting system, we can pinpoint the error in the system so that we can fix the voting system code.

Pre-installed Grafana?

As I mentioned already Linkerd comes pre-configured with Grafana to make it really easy to start using Grafana dashboards for all the metrics that Linkerd makes available, this is very useful for ops to see u to date time series based information on services and clusters.

Image for post
Image for post
Grafana dashboard for web

Conclusion:

Firstly I want to thank the people over at Linkerd and Buoyant for making an awesome product and a really nice sample app! Great work!

What can I do now:

Now that you know how to install Linkerd and the Linkerd Service sidecar, you can now install this on your deployments and trouble shout or mitigate any future issues you may have by using Grafana for Ops and tap to look at live call issues for errors.

Any questions:

For more information please feel free to reach out on LinkedIn or Twitter.
I hope that you enjoyed this blog!

Image for post
Image for post

The Startup

Medium's largest active publication, followed by +752K people. Follow to join our community.

Brian Mathews

Written by

Technical Consultant and Evangelist with a focus on Serverless and DevOps. Why not give Oracle Cloud a try with $300 free credits!https://bitly.com/mathewsbrian

The Startup

Medium's largest active publication, followed by +752K people. Follow to join our community.

Brian Mathews

Written by

Technical Consultant and Evangelist with a focus on Serverless and DevOps. Why not give Oracle Cloud a try with $300 free credits!https://bitly.com/mathewsbrian

The Startup

Medium's largest active publication, followed by +752K people. Follow to join our community.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store