Hidden Gems: A few things you might not know about Kubernetes

Seifeddine Rajhi
14 min readNov 29, 2023

--

Lesser-Known Aspects of Kubernetes

introduction:

Kubernetes has revolutionized the way we manage containerized applications, but it’s packed with hidden features that even experienced users might not be aware of.

Let’s dive into a few of these hidden gems and discover the lesser-known capabilities of Kubernetes

Sorting and Organizing Your Pods:

Ever wished you could organize your pod list in a more meaningful way? With Kubernetes, you can! Simply use the --sort-by flag along with the kubectl get pods command to sort your pods by various criteria, such as pod name or creation time.

Running kubectl get pods --sort-by=.metadata.name might just save you from endless scrolling through your pod list.

Let’s sort the pods in descending order, i.e., with the newest pods appearing first:

kubectl get pods --sort-by=.metadata.creationTimestamp --no-headers | tail -r
ubuntu-pod-3 2/2 Running 0 5m17s
ubuntu-pod-2 2/2 Running 0 13m7s
ubuntu-pod-1 2/2 Running 0 26m

Listing All Object Types:

Did you know you can list all the object types that your cluster supports? Use the kubectl api-resources command:

kubectl api-resources

When we want a more encompassing list of all resources in a namespace, we can combine the kubectl api-resources command with kubectl-get:

$ kubectl api-resources --verbs=list --namespaced -o name | xargs -n 1 kubectl get --ignore-not-found --show-kind -n <namespace>

kubectl api-resources –verbs=list –namespaced -o name retrieves all namespaced API resource types that support the list API verb. Then it outputs their names. Those names are then redirected to xargs as standard input.

xargs -n 1 singly passes each of those names as initial arguments to kubectl get –ignore-not-found –show-kind -n <namespace>. Then, the kubectl get command returns a list of resources belonging to each resource type in the specified namespace.

Default Resources and Limits with LimitRange and ResourceQuotas:

In Kubernetes, namespaces provide a mechanism for isolating groups of resources within a single cluster. Namespaces are a way to divide cluster resources into groups for multiple users (via resource-quota). Each namespace will have one or multiple containers running inside it.

After creating a namespace for each team within the cluster, consider that what if one team i.e. namespace consumes more number of resources from the cluster like CPU and memory and other team’s resources starve for resources as the cluster has a very limited amount of available hardware resources. This creates a noisy Neighbour problem within the cluster.

To avoid this as an administrator, first you create a namespace within cluster and then you can use ResourceQuota and LimitRange to assign resource quotas on namespaces and set limits for containers running inside any namespace.

Resource Quotas:

After creating Namespaces, we can use the ResourceQuota object to limit down the total amount of resource used by the namespace. We can use ResourceQuota to set limits for different object types that can be created within a namespace along with setting quotas for resources like CPU and memory.

A ResourceQuota for setting quota on resources looks like this:

apiVersion: v
kind: ResourceQuota
metadata:
name: teamx-resource-quota
namespace: teamx
spec:
hard:
limits.cpu: 150m
limits.memory: 600Mi
requests.cpu: 150m
requests.memory: 600Mi

limits.cpu is the maximum CPU limit for all the containers in the Namespace, i.e. the entire namespace.

limits.memory is the maximum Memory limit for all containers in the Namespace, i.e. the entire namespace.

requests.cpu is the maximum CPU requests for all the containers in the Namespace. As per the above YAML, total requested CPU in the Namespace should be less than 150m.

requests.memory is the maximum Memory requests for all the containers in the Namespace. As per the above YAML, Total requested memory in the namespace should be less than 600Mi.

LimitRange for Containers:

We can create a LimitRange object in our Namespace which can be used to set limits on resources on containers running within the namespace. This is used to provide default limit values for Pods which do not specify this value themeselves to equally distribute resources within a namespace.

A LimitRange provides constraints that can:

  • Apply minimum and maximum cpu resources usage limit per Pod or Container in a namespace.
  • Apply minimum and maximum memory request limit per PersistentVolumeClaim in a namespace.
  • Apply minimum and maximum cpu resources usage limit per Pod or Container in a namespace.
  • Set default request/limit for resources within a namespace and then automatically set the limits to Containers at runtime
apiVersion: v
kind: LimitRange
metadata:
name: teamx-limit-range
spec:
limits:
- default:
memory: 200Mi
cpu: 50m
defaultRequest:
memory: 200Mi
cpu: 50m
max:
memory: 200Mi
cpu: 50m
min:
memory: 200Mi
cpu: 50m
type : Container

The above YAML file has 4 sections, max, min, default, and defaultRequest.

The default section will set up the default limits for a container in a pod. Any container with no limits defined will get these values assigned as default.

The defaultRequest section will set up the default requests for a container in a pod. Any container with no requests defined will get these values assigned as default.

The max section will set up the maximum limits that a container in a Pod can set. The value specified in the default section cannot be higher than this value.

The min section will set up the minimum Requests that a container in a Pod can set. The value specified in the defaultRequest section cannot be lower than this value.

kubectl debug:

One of the most forgotten but powerful Kubectl commands is debug. It allows you to create a sidecar container on any pod, copy a pod to a new instance for debugging, and even access the pod's filesystem.

Use the kubectl debug node command to deploy a Pod to a Node that you want to troubleshoot. This command is helpful in scenarios where you can't access your Node by using an SSH connection. When the Pod is created, the Pod opens an interactive shell on the Node. To create an interactive shell on a Node named mynode, run:

kubectl debug node/mynode -ti --image=ubuntu -- chroot /host bash

You can also use the kubectl debug command to add ephemeral containers to a running Pod for debugging.

First, create a pod for the example:

kubectl run ephemeral-demo --image=registry.k8s.io/pause:3.1 --restart=Never

The examples in this section use the pause container image because it does not contain debugging utilities, but this method works with all container images.

If you attempt to use kubectl exec to create a shell you will see an error because there is no shell in this container image.

kubectl exec -it ephemeral-demo -- sh
OCI runtime exec failed: exec failed: container_linux.go:346: starting container process caused "exec: \"sh\": executable file not found in $PATH": unknown

You can instead add a debugging container using kubectl debug. If you specify the -i/--interactive argument, kubectl will automatically attach to the console of the Ephemeral Container.

kubectl debug -it ephemeral-demo --image=busybox:1.28 --target=ephemeral-demo
Defaulting debug container name to debugger-8xzrl.
If you don't see a command prompt, try pressing enter.
/ #

This command adds a new busybox container and attaches to it. The --target parameter targets the process namespace of another container. It's necessary here because kubectl run does not enable process namespace sharing in the pod it creates.

Krew: The Plugin Marketplace:

There’s a massive marketplace of Kubectl plugins that can extend its functionality and make your life easier. Meet Krew:

https://krew.sigs.k8s.io/docs/user-guide/setup/install/

Krew is the plugin manager for kubectl command-line tool.

Krew helps you:

  • discover kubectl plugins,
  • install them on your machine,
  • and keep the installed plugins up-to-date.

There are 225 kubectl plugins currently distributed on Krew.

Krew works across all major platforms, like macOS, Linux and Windows.

Prow: CI/CD for Kubernetes:

Kubernetes’ Project CI/CD is powered by Prow, an open-source CI system that can scale to hundreds of thousands of jobs

The Kubernetes Testing SIG describes Prow as “a CI/CD system built on Kubernetes for Kubernetes that executes jobs for building, testing, publishing and deploying.” However, that description does not highlight perhaps the most important inferred capability of Prow — a capability that is at the heart of best-of-breed CI/CD automation tools — that capability is automation that starts with code commits — and in the case of Prow it starts with a scalable stateless microservice called hookthat triggers native K8s CI/CD jobs (among a number of things that hook does via plugins).

It is this GitHub automation capability that has been one of the key reasons why other K8s projects have adopted Prow for their own CI/CD. But Prow is more than just GitHub webhook automation and CI/CD job execution. Prow is also:

Extending Kubernetes API:

Did you know you can extend the Kubernetes API itself? Meet the Kubernetes API Aggregator Layer, a powerful tool for introducing subresources or aggregating them, like the custom metrics server.

The aggregation layer enables installing additional Kubernetes-style APIs in your cluster. These can either be pre-built, existing 3rd party solutions, such as service-catalog, or user-created APIs like apiserver-builder, which can get you started.

Auto-Provisioning Namespaces:

There’s an easy way to auto-provision namespaces without giving extra permissions to your users. Use the NamespaceAutoProvision Admission controller.

This admission controller examines all incoming requests on namespaced resources and checks if the referenced namespace does exist. It creates a namespace if it cannot be found. This admission controller is useful in deployments that do not want to restrict the creation of a namespace prior to its usage.

Enforcing Custom Rules:

Kubernetes offers a simple way to intercept and validate requests with ValidatingAdmissionWebhooks and MutatingAdmissionWebhooks.

This is a simple Kubernetes admission webhook. It is meant to be used as a validating and mutating admission webhook only and does not support any controller logic. It has been developed as a simple Go web service without using any framework or boilerplate such as kubebuilder.

This project is aimed at illustrating how to build a fully functioning admission webhook in the simplest way possible. Most existing examples found on the web rely on heavy machinery using powerful frameworks, yet fail to illustrate how to implement a lightweight webhook that can do much-needed actions such as rejecting a pod for compliance reasons or injecting helpful environment variables.

Dynamic Resource Allocation:

Allocate resources outside your cluster with Dynamic Resource Allocation. Since K8s v1.26, use ResourceClass and ResourceClaims to extend offerings beyond the cluster.

In contrast to native resources (such as CPU or RAM) and extended resources (managed by a device plugin, advertised by kubelet), the scheduler has no knowledge of what dynamic resources are available in a cluster or how they could be split up to satisfy the requirements of a specific ResourceClaim. Resource drivers are responsible for that. Drivers mark ResourceClaims as allocated once resources for it are reserved. This also then tells the scheduler where in the cluster a claimed resource is actually available.

ResourceClaims can get resources allocated as soon as the ResourceClaim is created (immediate allocation), without considering which Pods will use the resource. The default (wait for first consumer) is to delay allocation until a Pod that relies on the ResourceClaim becomes eligible for scheduling. This design with two allocation options is similar to how Kubernetes handles storage provisioning with PersistentVolumes and PersistentVolumeClaims.

Managing requests in the Kubernetes API:

In Kubernetes, request queue management is handled by API Priority and Fairness (APF). It is enabled by default, in Kubernetes 1.20 and beyond. The API server also provides two parameters, --max-requests-inflight (default is 400) and --max-mutating-requests-inflight (default is 200), for limiting the number of requests. If APF is enabled, both of these parameters are summed up — and that’s how the API server’s total concurrency limit is defined.

That said, there are some finer details to account for:

  • Long-running API requests (e.g., viewing logs or executing commands in a pod) are not subject to APF limits, and neither are WATCH requests.
  • There is also a special predefined priority level called exempt. Requests from this level are processed immediately.

So you can fine-tune how the kubernetes API server queues and handles requests to prioritize essential requests and manage latency effectively.

API Priority with kubectl:

You can explore how busy your Kubernetes API server is by examining the Priority Level queue

With the APIPriorityAndFairness feature enabled, the kube-apiserver serves the following additional paths at its HTTP(S) ports.

You need to ensure you have permissions to access these endpoints. You don’t have to do anything if you are using admin. Permissions can be granted if needed following the RBAC doc to access /debug/api_priority_and_fairness/ by specifying nonResourceURLs.

/debug/api_priority_and_fairness/dump_priority_levels - a listing of all the priority levels and the current state of each. You can fetch like this:

kubectl get --raw /debug/api_priority_and_fairness/dump_priority_levels
  • The output will be in CSV and similar to this:
PriorityLevelName, ActiveQueues, IsIdle, IsQuiescing, WaitingRequests, ExecutingRequests, DispatchedRequests, RejectedRequests, TimedoutRequests, CancelledRequests
catch-all, 0, true, false, 0, 0, 1, 0, 0, 0
exempt, 0, true, false, 0, 0, 0, 0, 0, 0
global-default, 0, true, false, 0, 0, 46, 0, 0, 0
leader-election, 0, true, false, 0, 0, 4, 0, 0, 0
node-high, 0, true, false, 0, 0, 34, 0, 0, 0
system, 0, true, false, 0, 0, 48, 0, 0, 0
workload-high, 0, true, false, 0, 0, 500, 0, 0, 0
workload-low, 0, true, false, 0, 0, 0, 0, 0, 0

Explanation for selected column names:

  • IsQuiescing indicates if this priority level will be removed when its queues have been drained.

/debug/api_priority_and_fairness/dump_queues - a listing of all the queues and their current state. You can fetch like this:

kubectl get --raw /debug/api_priority_and_fairness/dump_queues
  • The output will be in CSV and similar to this:
PriorityLevelName, Index,  PendingRequests, ExecutingRequests, SeatsInUse, NextDispatchR,   InitialSeatsSum, MaxSeatsSum, TotalWorkSum
workload-low, 14, 27, 0, 0, 77.64342019ss, 270, 270, 0.81000000ss
workload-low, 74, 26, 0, 0, 76.95387841ss, 260, 260, 0.78000000ss
...
leader-election, 0, 0, 0, 0, 5088.87053833ss, 0, 0, 0.00000000ss
leader-election, 1, 0, 0, 0, 0.00000000ss, 0, 0, 0.00000000ss
...
workload-high, 0, 0, 0, 0, 0.00000000ss, 0, 0, 0.00000000ss
workload-high, 1, 0, 0, 0, 1119.44936475ss, 0, 0, 0.00000000ss

Explanation for selected column names:

  • NextDispatchR: The R progress meter reading, in units of seat-seconds, at which the next request will be dispatched.
  • InitialSeatsSum: The sum of InitialSeats associated with all requests in a given queue.
  • MaxSeatsSum: The sum of MaxSeats associated with all requests in a given queue.
  • TotalWorkSum: The sum of total work, in units of seat-seconds, of all waiting requests in a given queue.
  • Note: seat-second (abbreviate as ss) is a measure of work, in units of seat-seconds, in the APF world.

/debug/api_priority_and_fairness/dump_requests - a listing of all the requests including requests waiting in a queue and requests being executing. You can fetch like this:

kubectl get --raw /debug/api_priority_and_fairness/dump_requests
  • The output will be in CSV and similar to this:
PriorityLevelName, FlowSchemaName,   QueueIndex, RequestIndexInQueue, FlowDistingsher,                        ArriveTime,                     InitialSeats, FinalSeats, AdditionalLatency, StartTime
exempt, exempt, -1, -1, , 2023-07-15T04:51:25.596404345Z, 1, 0, 0s, 2023-07-15T04:51:25.596404345Z
workload-low, service-accounts, 14, 0, system:serviceaccount:default:loadtest, 2023-07-18T00:12:51.386556253Z, 10, 0, 0s, 0001-01-01T00:00:00Z
workload-low, service-accounts, 14, 1, system:serviceaccount:default:loadtest, 2023-07-18T00:12:51.487092539Z, 10, 0, 0s, 0001-01-01T00:00:00Z

Explanation for selected column names:

  • QueueIndex: The index of the queue. It will be -1 for priority levels without queues.
  • RequestIndexInQueue: The index in the queue for a given request. It will be -1 for executing requests.
  • InitialSeats: The number of seats will be occupied during the initial (normal) stage of execution of the request.
  • FinalSeats: The number of seats will be occupied during the final stage of request execution, accounting for the associated WATCH notifications.
  • AdditionalLatency: The extra time taken during the final stage of request execution. FinalSeats will be occupied during this time period. It does not mean any latency that a user will observe.
  • StartTime: The time a request starts to execute. It will be 0001-01-01T00:00:00Z for queued requests.

Manually Triggering Pod Evictions:

A safer alternative to deleting pods is using evictions, because they respect pod disruption budgets and other termination policies. You can manually trigger a pod eviction using the Kubernetes eviction API.

Create a file called eviction.json with a content similar to this:

{
"apiVersion": "policy/v1",
"kind": "Eviction",
"metadata": {
"name": "pod-name-here",
"namespace": "default"
}
}

And run this command:

curl -v -H 'Content-type: application/json' https://your-cluster-api-endpoint.example/api/v1/namespaces/default/pods/pod-name-here/eviction -d @eviction.json

Pod Overhead:

When you run a Pod on a Node, the Pod itself takes an amount of system resources. These resources are additional to the resources needed to run the container(s) inside the Pod. In Kubernetes, Pod Overhead is a way to account for the resources consumed by the Pod infrastructure on top of the container requests & limits.

In Kubernetes, the Pod’s overhead is set at admission time according to the overhead associated with the Pod’s RuntimeClass.

A pod’s overhead is considered in addition to the sum of container resource requests when scheduling a Pod. Similarly, the kubelet will include the Pod overhead when sizing the Pod cgroup, and when carrying out Pod eviction ranking.

You need to make sure a RuntimeClass is utilized which defines the overhead field.

To work with Pod overhead, you need a RuntimeClass that defines the overhead field. As an example, you could use the following RuntimeClass definition with a virtualization container runtime that uses around 120MiB per Pod for the virtual machine and the guest OS:

apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
name: kata-fc
handler: kata-fc
overhead:
podFixed:
memory: "120Mi"
cpu: "250m"

Workloads which are created which specify the kata-fc RuntimeClass handler will take the memory and cpu overheads into account for resource quota calculations, node scheduling, as well as Pod cgroup sizing.

Consider running the given example workload, test-pod:

apiVersion: v1
kind: Pod
metadata:
name: test-pod
spec:
runtimeClassName: kata-fc
containers:
- name: busybox-ctr
image: busybox:1.28
stdin: true
tty: true
resources:
limits:
cpu: 500m
memory: 100Mi
- name: nginx-ctr
image: nginx
resources:
limits:
cpu: 1500m
memory: 100Mi

At admission time the RuntimeClass admission controller updates the workload’s PodSpec to include the overhead as described in the RuntimeClass. If the PodSpec already has this field defined, the Pod will be rejected. In the given example, since only the RuntimeClass name is specified, the admission controller mutates the Pod to include an overhead.

After the RuntimeClass admission controller has made modifications, you can check the updated Pod overhead value:

kubectl get pod test-pod -o jsonpath='{.spec.overhead}'

The output is:

map[cpu:250m memory:120Mi]

future enhancements:

All the future enhancements to the kubernetes-adjacent projects are publicly available and maintained in git?

You can find it here:

If you have any good idea (and the resources to make it a reality), you can even submit your own!

Until next time 🎉 🇵🇸

Photo by Tsuyuri Hara on Unsplash

I hope this post gave you a better understanding of how to manage application secrets.

Thank you for Reading !! 🙌🏻😁📃, see you in the next blog.🤘🇵🇸

🚀 Thank you for sticking up till the end. If you have any questions/feedback regarding this blog feel free to connect with me :

♻️ 🇵🇸LinkedIn: https://www.linkedin.com/in/rajhi-saif/

♻️🇵🇸 Twitter : https://twitter.com/rajhisaifeddine

The end ✌🏻

🔰 Keep Learning !! Keep Sharing !! 🔰

--

--

Seifeddine Rajhi

AWS Community builder | → I build and break stuff, preferably in the cloud, ❤ OpenSource. Twitter: @rajhisaifeddine