K8s: Volumes & Claims — Part1

Sandeep Baldawa
The Startup
Published in
7 min readNov 1, 2020

Kubernetes (K8s) was originally developed as a platform for stateless applications with the idea that persistent data will be stored separately. However, as the project matured, many organizations wanted to also leverage K8s for their stateful applications, so persistent volume management was added.

In this blog post, we will try to understand Persistent Volumes and Claims with examples. The flow we will try to go with is

- Understand the basics of volumes & how to use the same- Ephermeral volumes- Persistent Volumes- How to use the same practically

Why do we need Volumes?

Pods are ephemeral; they come and go frequently. What if you have data that you must keep even if the POD goes down? This requirement means that we need to make the POD and its associated data loosely coupled — essentially, this would allow data to exist independent of any POD. This data is called a Kubernetes Volume, and it helps persist state across multiple PODs. A persistent volume is like an external hard drive; you can plug it in and save your data on it.

P.S.- The documentation from the K8s official site has more details.

The high-level idea of Kubernetes Volumes

Why another K8s Object?

The first time I tried understanding the concept, I was intrigued by why we needed another object. Why not use the storage provided by your local machine or by any of the cloud providers?

Applications that directly use specific storage systems like Amazon S3, Azure File, Block storage, etc., create an unhealthy dependency. What if you would like to move to another cloud provider, it would be difficult but risky because of the tight coupling between the storage provider and application.

Kubernetes is trying to change this by creating an abstraction called Persistent Volume, allowing cloud-native applications to connect to a wide variety of cloud storage systems without creating an explicit dependency on those systems. This can make the consumption of cloud storage much more seamless and eliminate integration costs. It can also make it much easier to migrate between clouds and adopt multi-cloud strategies.

Supported types of K8s Volumes

Broadly they could be classified based on their life-cycle into two broad categories.

  1. Ephemeral Volumes These are tightly coupled with the Node’s lifetime (e.g., emptyDir or hostPath). They are deleted if the Node goes down.
  2. Persistent Volumes — These are meant for long-term storage and are independent of the Pod/Node life-cycle. These can be cloud volumes (like gcePersistentDisk, awsElasticBlockStore, azureFile or azureDisk, etc.), NFS (Network File Systems), or Persistent Volume Claims (a series of abstraction to connect to the underlying cloud-provided storage volumes)

P.S. — All types of volumes have similar YAML files with minor differences. We will now try to understand what these different types of volumes are and when we should use the same.

Let’s start by looking at Ephemeral volumes.

Ephemeral Volume type1: emptyDir

An emptyDir volume is first created when a Pod is assigned to a node and exists as long as that Pod is running on that node and the volume is initially empty. All containers in the Pod can read and write the same files in the emptyDir volume, When a Pod is removed from a node for any reason, the data in the emptyDir is deleted permanently.

Sure, this is cool, but what storage medium is this really stored on?

Depending on your environment, emptyDir volumes are stored on whatever medium that backs the node, such as disk or SSD or network storage.

Let’s try to create one ourselves and see what storage medium is used, for all examples will be using minikube. We, Will, be using the below YAML file, similar to the one on the official documentation

apiVersion: v1
kind: Pod
metadata:
name: test-nginx
spec:
containers:
- image: nginx
name: test-nginx
volumeMounts:
- mountPath: /cache
name: cache-volume
volumes:
- name: cache-volume
emptyDir: {}

Let’s try applying the YAML file and get into the POD

Let's try K8 : kubectl apply -f emptydir.yml
pod/test-nginx created

Let's try K8 : kubectl get pods
NAME READY STATUS RESTARTS AGE
test-nginx 1/1 Running 0 7s

Let's try K8 : kubectl exec -it test-nginx -- /bin/bash
root@test-nginx:/# mount | grep -i cache
/dev/vda1 on /cache type ext4 (rw,relatime)

If we see the storage medium used for the emptyDir mounted on the container we just created, it shows up as /dev/vda1 a paravirtualization disk driver.

Awesome, but do we have the freedom to choose what medium of storage is being used?

If you set the emptyDir.medium field to MemoryKubernetes mounts a tmpfs (RAM-backed filesystem) for you instead; let’s try that with a slightly modified YAML file which not includes the medium

apiVersion: v1
kind: Pod
metadata:
name: test-nginx
spec:
containers:
- image: nginx
name: test-nginx
volumeMounts:
- mountPath: /cache
name: cache-volume
volumes:
- name: cache-volume
emptyDir:
medium: Memory

From below, we can clearly see now the mounted directory is of type tmpfs. While tmpfs is very fast, be aware that, unlike disks, tmpfs is cleared on node reboot, and any files you write count against your container’s memory limit.

Let's try K8 : kubectl apply -f emptydir.yml
pod/test-nginx created
Let's try K8 : kubectl get pod
NAME READY STATUS RESTARTS AGE
test-nginx 1/1 Running 0 4s
Let's try K8 : kubectl exec -it test-nginx -- /bin/bash
root@test-nginx:/# mount | grep -i cache
tmpfs on /cache type tmpfs (rw,relatime)

All this is good, but why in the world would anyone use storage in a POD which by nature is ephemral, and no longer exists when the node goes down?

There are times when the state is not important beyond a period of time, which is best suited for this use case. The official documentation mentions the below use cases

  1. scratch space, such as for a disk-based merge sort.
  2. Checkpointing a long computation for recovery from crashes.
  3. Holding files that a content-manager container fetches while a webserver Container serves the data.

Ephemeral Volume type2: hostPath

A hostPath volume mounts a file or directory from the host node's filesystem into your Pod.

The emptyDir volumes are analogous to the implicit, per-container storage strategy of Docker. They are sandboxes managed by the container runtime. On the other hand, hostPath volumes mount a file or directory from the host node’s filesystem directly into the pod. Let’s try using the same using a sample YAML file

apiVersion: v1
kind: Pod
metadata:
name: test-pd
spec:
containers:
- image: k8s.gcr.io/test-webserver
name: test-container
volumeMounts:
- mountPath: /test-pd
name: test-volume
volumes:
- name: test-volume
hostPath:
# directory location on host
path: /data
# this field is optional
type: Directory

Let’s do a kubectl apply and see if the pod comes up; what we see is the POD somehow does not come up. Why is it so?

Let's try K8 : kubectl apply -f hostPath_notCreated.yml
pod/test-pd created
Let's try K8 : kubectl get po
NAME READY STATUS RESTARTS AGE
test-pd 0/1 ContainerCreating 0 2s

This is because the host does not have the directory we are trying to mount in the POD

Let's try K8 : kubectl describe po test-pd | grep Events: -A5
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> Successfully assigned default/test-pd to minikube
Warning FailedMount 2s (x7 over 33s) kubelet, minikube MountVolume.SetUp failed for volume "test-volume" : hostPath type check failed: /host_data is not a directory

Another option is to use a hostPath wherein it can create a directory using DirectoryOrCreate or a file using FileOrCreate on the host if it does not exist; this can help us resolve this issue. Let’s try using the same

apiVersion: v1
kind: Pod
metadata:
name: test-webserver
spec:
containers:
- name: test-nginx-hostPath
image: nginx
volumeMounts:
- mountPath: /var/local/aaa
name: mydir
- mountPath: /var/local/aaa/1.txt
name: myfile
volumes:
- name: mydir
hostPath:
# Ensure the file directory is created.
path: /var/local/aaa
type: DirectoryOrCreate
- name: myfile
hostPath:
path: /var/local/aaa/1.txt
type: FileOrCreate

Let’s do a kubectl apply and see if the pod comes up; it looks like it does come up.

Let's try K8 : kubectl apply -f hostPath.yml
pod/test-webserver created
Let's try K8 : kubectl get pods
NAME READY STATUS RESTARTS AGE
test-webserver 1/1 Running 0 8s

Also, if we log into the POD, we can see the mount and the file

Let's try K8 : kubectl exec -it test-webserver -- /bin/bash
root@test-webserver:/# ls /var/local/aaa
1.txt

So far, we have looked at the basics of Volumes and some Ephemeral volumes examples. In the next blog post(continuation of this post), we will look at Persistent Volumes. Till then ciao.

References

--

--

Sandeep Baldawa
The Startup

whoami >> Slack, Prev — Springpath (Acquired by Cisco), VMware, Backend Engineer, Build & Release, Infra, Devops & Cybersecurity Enthusiast