Ease Kubernetes Troubleshooting — Introducing kubectl-nurse

Wensy Chan(Wenya Chen)
Box Tech Blog
Published in
5 min readNov 8, 2021
Cartoon of “KN” Illustrated by Navied Mahdavian / Art directed by Sarah Kislak & Christie Folsom
Illustrated by Navied Mahdavian / Art directed by Sarah Kislak & Christie Folsom

Background

At Box, we use Kubernetes (K8s) to deploy and run hundreds of different microservices. Service owners have access to their apps inside our K8s clusters, so that they can diagnose their own Pods when not being healthy. Oftentimes, the debugging process involves running multiple kubectl commands to further determine the actual cause or issues from unhealthy apps. For example, “kubectl get pod -n <namespace>” needs to be run first to locate the problematic Pod in a specific Namespace, “kubectl describe pod <pod> -n <namespace>” to find out which containers are failing, and additional output parsing is needed to determine the actual reason(s), e.g. looking at exit code of containers, readiness/liveness probes status, application logs, etc. These commands are typically lengthy and the returned output contains more than what our users need, which leads to a counterproductive debugging process. Hence, we looked into kubectl plugins to alleviate the painful debugging experience.

Why a kubectl plugin?

A kubectl plugin is essentially a sub-command of the native kubectl command-line interface for Kubernetes which can provide extended and customized capabilities. Given our problems stated above, building or adopting a plugin that allows our engineers to view all the important pieces of the information at one glance without having to run multiple kubectl commands — a perfect choice. In this blog post, we’re presenting a handy plugin, kubectl-nurse, that covers all our requirements to debug Box’s K8s apps with better efficiency!

What are the functionalities of kubectl-nurse?

The main functionality of the kubectl-nurse plugin is to display the not-ready Pods under a given Deployment with detailed information pertaining to container status (e.g. exitCode and restartCount). It further contains all related events that could be blocking the Pod from being ready (e.g. missing required configMaps).

As demonstrated in the following example, when executing “kubectl nurse -n <namespace>”, all the Deployments under the designated Namespace will be collected. From the collected Deployments, all Pods that have a status of “not-ready” will be returned along with information of containers running within the respective Pod. It also returns significant events that could potentially help the user to deduce reasons that are preventing Pods from being ready. For instance, if a Pod starting process is hindered by a missing Secret required in its VolumeMount, one would see the following output:

Sometimes, it could be specific containers crashlooping that cause the Pod to end up in a not-ready state. In this case, the kubectl-nurse plugin is helpful to locate the failing container quickly with additional information provided:

Although it is visually appealing to view all problematic Pods at one output, inspecting healthy Pods and related container information enables users to assure that the potential cause of app failures indeed comes from the not-ready Pods. When users want to see the status of both ready and not-ready Pods under all workloads in a namespace, they can simply add the “ — all” flag at the end of the above command. The returning output will be similar to the above example except it also returns the healthy Pods and related container details.

The kubectl-nurse plugin also provides functionality for narrowing down the scope to a single Pod or a single Deployment during the debugging process. This can be easily achieved by adding Deployment names or Pod names as arguments to the command. If a specific container is suspected to cause trouble, the “-c” flag can be used to return information only pertaining to the designated container. In summary, these are the commands to specify certain Deployments, Pods, and containers:

Besides the main use case, there are two auxiliary functions built in this plugin to aid the debugging process of K8s apps. One useful sub-command covers the listing of container names and corresponding images by simply using the “ — list-containers” flag. This functionality eases users from remembering the container names when they want to check detailed information regarding a specific container.

The other helper function allows users to review access restrictions. At Box, we use RBAC to regulate authorization to Namespaces based on the LDAP groups of individual users. Since the corresponding LDAP group names are also stored as Labels in each Namespace, the kubectl-nurse plugin reads this information and retrieves group membership information from the underlying Linux system to present an easily consumable overview of the access to a given Namespace:

How is kubectl-nurse built?

This plugin is written in Go and built with client-go, Cobra and cli-runtime. Client-go makes interacting with the K8s clusters a breeze and the Cobra library is fully loaded to effortlessly create CLI programs, such an automatic help text generation for commands/flags. Additionally, cli-runtime was utilized to maintain kubectl compatible behavior, e.g. inheriting “ — namespace” and “pod/<pod name>” options from kubectl native commands.

Future Improvement

Although the current version of kubectl-nurse plugin has tremendously increased the debugging efficiency for our users by reducing the interactions needed to find relevant information, it’s currently limited to the inspection of failing Pods and containers. As the next step, we plan to include common remediation steps for issues such as unattached Volumes, imagePullBackOff errors, etc. Also, the supported workloads of the current version are limited to Deployments and we aim to extend the coverage to other workload types such as DaemonSets, Jobs/CronJobs, and StatefulSets.

Summary

By presenting a straightforward, simplified view of the information needed to debug K8s apps, the kubectl-nurse plugin made the troubleshooting process a painless experience for our users. This ultimately leads to more independence for our engineers to operate on our Kubernetes landscape. The plugin is set to get open-sourced in the near future. Stay tuned and feel free to follow us on GitHub for any upcoming updates!

Interested in learning more about Box? Checkout our careers page!

--

--