Pavel Glukhikh
Jul 22 · 5 min read

If you work with or have worked with, Kubernetes, chances are you have come across some of the issues described here.

It happens to the best of us: you come into the office on a Monday, your NMS is all in the red, your Kubernetes cluster nodes are dead and your load balancer is broken.

I’m here to save you some time with the troubleshooting process by going over common issues that I have seen in my own deployments.

I run a lot of development and bleeding-edge code and always like to experiment with new ways of deploying Kubernetes nodes and various overlay networks. It is in these deployments that most of my Kubernetes issues occur.

Let’s dive right in.


No Response From Remote Kubectl

The connection to the server localhost:8080 was refused — did you specify the right host or port?

Most Kubernetes admins have seen this error at some point. It occurs when either the Kubernetes API is not reachable on the specified URL or there is an issue with the Kubernetes API.

Note that a permissions issue with the Kubernetes API will most likely produce a different error.

First, let’s make sure that kubectl is installed and in the correct version:

kubectl version

The output should be something like the one below:

Client Version: version.Info{Major:”1", Minor:”12", GitVersion:”v1.12.1", GitCommit:”4ed3216f3ec431b140b1d899130a69fc671678f4", GitTreeState:”clean”, BuildDate:”2018–10–05T16:46:06Z”, GoVersion:”go1.10.4", Compiler:”gc”, Platform:”linux/amd64"}

If there are any other errors (command not found, etc):

  • Ensure that you are running kubectl via a user that has access to perform kubectl commands.
  • Ensure that the kubectl paths are set up properly.
  • kubectl will not work with the “root” account by default.

If kubectl itself is working as it should, we can move on to troubleshooting the cluster.

ssh to one of your master nodes:

ssh lyokowarrior@192.168.2.52

One of the most common issues is the failure of the etcd service on one or more master nodes.

You can check the service status via:

service etcd status

If the service reports dead, your issue is there.

One of the most common reasons for the failure of the etcd service (especially after master node reboot) is that swap is enabled on the node. This is particularly true in kubeadm-managed clusters.

As root, run:

swapoff -ased -i '/ swap / s/^/#/' /etc/fstab

Then, disable swap from starting at boot:

nano /etc/fstab# Comment out the second line mentioning swap
# Save the file

Swap may also affect the kubelet service.

You can check the service status via:

service kubelet status

You can also find the logs for both services in /var/log for further troubleshooting.

Run these checks on the entire etcd cluster (all master nodes).

You should now have a healthy cluster and kubectl should work.


Issues With Persistent Volumes

There are two parts to troubleshooting persistent volumes:

  • Troubleshooting volume claims.
  • Troubleshooting volume pod binding.

Volume Availability

The first step is to see if the Kubernetes cluster can see the PV. The easiest way to look at PV’s in the cluster is via the Kubernetes dashboard.

If you do not have a dashboard installed, you have a few options:

  • Get an iOS or Android app called Kuber and connect to your cluster.
  • Set up the dashboard. (Here’s a detailed guide).
  • Use kubectl.

To view PV’s via kubectl, run:

lyokowarrior@kube-client:~$ kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
task-pv-volume 10Gi RWO Retain Bound default/task-pv-claim manual 20h

The above output lists one healthy, bound volume.

Common troubleshooting steps:

  • If you are using NFS, make sure the NFS shares are accessible by the cluster.
  • Make sure the volumes are healthy and in the “bound” status.
  • Review the example PV guide here:

A Service is Unreachable

A common problem on any cluster is that a deployed service is unreachable upon deployment, or becomes unreachable for some reason.

For example, we have an HTTPD deployment running on port 3182 that is supposed to be exposed to the local LAN via a load balancer.

  • Scenario 1: The service is not reachable at all. (No reply from curl localhost:3182).
  • Scenario 2: The service is only reachable internally.

A common issue here is that either the deployment was not exposed correctly or something happened to the service-post deployment.

Once again, the Kubernetes dashboard can help you understand the status of your services. If you do not have a dashboard, you can use kubectl.

To re-deploy the service:

kubectl delete service affected-service

(Replace the service name with your own).

kubectl expose deployment affected-deployment -n=default --type=LoadBalancer --name=affected-service --external-ip=192.168.2.54 --port=3182

Recreate the exposed service. (Replace the names, IPs, and namespaces with your own).

Note: Check the pod configuration and ensure that the port the HTTPD server is listening on matches the exposed port.


Can’t Log In to the Dashboard

Dashboards on self-hosted Kubernetes clusters can be tricky. A common problem is not being able to log in to the dashboard.

A token is required to log in to most installations and the first step is to check if the token works, and if the correct one was selected:

kubectl get secrets -n=kube-system | grep dashboard

This should bring up a list of all secrets relating to the dashboard. If nothing appears, something may be wrong with the dashboard deployment.

Have a look at my deployment guide or one of the links below for instructions on deploying a dashboard.

Now, let’s get the contents of the secret:

kubectl describe secret kubernetes-dashboard-token-xxx0x-n=kube-system

(Replace the name of the secret with the name on the list from the previous step).

Copy and paste the secret into the dashboard login page. You should now be able to log in.

A common issue is an HTTP authentication error when trying to get to the dashboard.

A workaround is to use Firefox as it is not as strict as some other browsers when verifying website validity. The actual fix to this issue is to re-deploy the dashboard in a supported deployment configuration.

Note: To completely remove all dashboard components:

kubectl delete deployment kubernetes-dashboard --namespace=kube-system 
kubectl delete service kubernetes-dashboard --namespace=kube-system
kubectl delete role kubernetes-dashboard-minimal --namespace=kube-system
kubectl delete rolebinding kubernetes-dashboard-minimal --namespace=kube-system
kubectl delete sa kubernetes-dashboard --namespace=kube-system
kubectl delete secret kubernetes-dashboard-certs --namespace=kube-system
kubectl delete secret kubernetes-dashboard-key-holder --namespace=kube-system

Verify that all dashboard components have been removed:

kubectl get secret,sa,role,rolebinding,services,deployments --namespace=kube-system | grep dashboard

As always, I welcome comments and questions.


Better Programming

Advice for programmers.

Pavel Glukhikh

Written by

I am a DevOps engineer and CEO of two tech startups. I enjoy all things tech, security, and physics. My background is in Cybersecurity and HPC.

Better Programming

Advice for programmers.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade