Kubernetes From Scratch (Part 2)

Kubernetes without Minikube or MicroK8s

Randal Kamradt Sr
Apr 24 · 16 min read
Image for post
Image for post
Photo by Sven Mieke on Unsplash

In my article in this series, “Kubernetes from Scratch,” I discussed a minimal Kubernetes system. Now I’d like to add to that success by making it a more complete system.

If you get Kubernetes from a cloud provider, things like storage and Ingress are most likely provided. The core Kubernetes system doesn’t provide things like Ingress, as that’s something that should integrate closely with the cloud system it’s running on.

To follow along, you should have read “Kubernetes from Scratch” and built up the system described. The system we built is four nodes running in VMs on a bare-metal server. As long as you have a similar setup, you should be able to follow along with minor adjustments. The cluster nodes are named kube1, kube2, kube3, and kube4. The kube1 node is the master, and the rest are workers. The main host is called beast and is running Ubuntu 20.04, and the VMs are running Ubuntu 18.04.

Also required for the second half of this article is a storage server we built in my article “Build Your Own In-Home Cloud Storage.” That server is running Ubuntu 20.04 on bare metal and has GlusterFS installed.

This article is intended to be cloud-provider agnostic. It’s not intended to be a blueprint for a production system. If you’re trying to build out a production system, stick with the tools the cloud provider offers. This article is looking under the hood of Kubernetes by seeing the pieces that make up the system and taking away a bit of the magic. The better informed you are about how Kubernetes operates, the better decisions you’ll make when operating with your cloud provider.

Before we start adding onto our system, let’s test out the pieces of the system we have running now. My previous article only tested that the Kubernetes API was working and could be accessed by the kubectl command. We should see if we can actually deploy something. I have a simple “Hello, World!” image that should be suitable for testing that. So back on the main host, create a file named test.yaml, and add the following:

kind: Deployment
metadata:
name: hellok8s-deployment
labels:
app: hellok8s
spec:
selector:
matchLabels:
app: hellok8s
template:
metadata:
labels:
app: hellok8s
spec:
containers:
- name: hellok8s
image: docker.io/rlkamradt/hellok8s:latest
ports:
- containerPort: 8080
---
apiVersion: v1
kind: Service
metadata:
name: hellok8s-service
spec:
type: ClusterIP
selector:
app: hellok8s
ports:
- port: 8080
targetPort: 8080

Now deploy it with the command kubectl create -f test.yaml. You should be able to see it with kubectl get all.

rkamradt@beast:~$ kubectl get all
NAME READY STATUS RESTARTS AGE
pod/hellok8s-deployment-85fdc9d4f-s5z4q 1/1 Running 0 13m

You may need to wait a few seconds until the pod status is Running. Now how can we test the service? Normally you’d expose it via a load balancer or Ingress service, but we don’t have those things yet. Fortunately, you can use the port-forward subcommand of kubectl.

rkamradt@beast:~$ kubectl port-forward service/hellok8s-service 8080:8080
Forwarding from 127.0.0.1:8080 -> 8080
Forwarding from [::1]:8080 -> 8080
Handling connection for 8080

This command won’t return until you’re done forwarding, so you’ll have to open a new terminal to test the service.

rkamradt@beast:~$ curl http://localhost:8080
Hello World

Now go back to your original terminal window and Ctrl-C the end-port forwarding.

The Kubernetes cluster that was created in my previous article only had two nodes. Between that time and now, I repeated the instructions for the worker node to create two more, so the system I’m currently running has one control-plane node and three worker nodes.

Which node is our pod running on? We can use the describe subcommand of kubectl to find out. Run kubectl get pods to find the exact name of the pod and its description:

rkamradt@beast:~$ kubectl describe pod hellok8s-deployment-85fdc9d4f-s5z4q
Name: hellok8s-deployment-85fdc9d4f-s5z4q
Namespace: default
Priority: 0
Node: kube3/192.168.122.223
...

My nodes are named kube1-4, so this one is running on the second worker node. You could use the nodeSelector property in the pod spec to always have it run on one node, especially if one node has unique capabilities like extra memory or SSD drives. But most of the time, it’s good to let the system decide how to spread the work across the nodes.

Let’s see if we can get all the nodes working. Scale up the replica for the service with this command kubectl scale --replicas=3 deployment hellok8s-deployment.

Now run kubectl get all until you see 3/3 in the ready column. Running describe on each of the pods should show the pods evenly distributed across the nodes. If you port forward the service, it should round-robin to each node. Unfortunately, my hellok8s app doesn’t log each request, so we can’t tell just by looking at the logs. I’m sure some Kubernetes master out there knows how we can tell via some other means, but we’ll assume it works and tackle that check later.

The first problem I see is we don’t want to have to run port forward for each service we offer. There are a couple of ways to address this, and if you’re running with a cloud provider, they should be able to address this for you. But we’re running on bare-metal, so we’ll have to come up with our own solution.

The two methods of exposing a service permanently are load balancing (not to be confused with the load balancing Kubernetes services provide automatically for you) and Ingress. Ingress typically only works with HTTP/HTTPS but allows advanced features, such as virtual hosting or path-based routing and SSL termination. Load balancing works at the TCP level so it can expose things like databases and message queues.

In the past, I’ve used a load balancer and set up an nginx reverse proxy on the main host. This gives me the best of both worlds, although it means manually configuring nginx, which isn’t terribly hard. So to start with, let’s install a load balancer on our cluster. I have some familiarity with MetalLB, so let’s see if we can get that installed and working on our system.

First off, if you’re running this exercise on a cloud provider instead of bare metal, you’ll need to skip this section. MetalLB doesn’t work with most cloud providers — and for good reason: Cloud providers offer their own load balancer. Refer to the documentation with your cloud provider to create a load balancer with access to external IPs.

Installing MetalLB isn’t too complicated: Apply a few YAML files, and do some configuration.

But first, according to the MetalLB docs, “If you’re using kube-proxy in IPVS mode, since Kubernetes v1.14.2 you have to enable strict ARP mode.” I have no idea what that means, but I figure it can’t hurt to put your ARP in strict mode. So edit the config map for kube-proxy via kubectl edit configmap -n kube-system kube-proxy, and set ipvs.strictARP to true. Eventually, the change will work its way through the system. Then install MetalLB:

kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.9.3/manifests/namespace.yaml
kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.9.3/manifests/metallb.yaml
kubectl create secret generic -n metallb-system memberlist --from-literal=secretkey="$(openssl rand -base64 128)"

Then you can see all the new things that have been created with kubectl get all -n metallb-system. The first line above created a namespace metallb-system, so to access all the pieces, you’ll have to use the namespace with -n metallb-system. One thing of note is all the speaker pods created, one for each node. It creates those to advertise all the routes it creates to the network. This is part of the magic of modern networking.

What exactly are we doing with the above commands? We’re using the definition files that MetalLB has provided for us to configure itself as the load balancer for the system. Now any service that has a type of LoadBalancer will be given an IP address automatically that’s routed to the VM bridge that was created when we installed the original system (see my article “Playing With VMs and Kubernetes”). You can use wget on the above YAML files to see what’s going on.

Out of the box, MetalLB remains dormant until it’s configured. To configure it, we need a config map named config in the namespace metallb-system. Create a file called metallbconfig.yaml, and enter the following:

apiVersion: v1
kind: ConfigMap
metadata:
namespace: metallb-system
name: config
data:
config: |
address-pools:
- name: default
protocol: layer2
addresses:
- 192.168.122.240-192.168.122.250

Now apply the configuration with kubectl apply -f metallbconfig.yaml.

We’re using the layer2 configuration and giving an address pool using the bridge network of 192.168.122.0/24 with the last number ranging from 240-250. I’m not sure how KVM allocates from that network, so I’m playing a little dangerously. But until smoke starts coming out of my server, I’ll just hope for the best. If you were doing this in production, you’d probably want to investigate that. Actually. if you were doing this in production, I’d fire you for not using a cloud provider. Anyway, we have 11 IP addresses to use to expose services.

This is the second time we’ve used ConfigMaps, and I wanted to bring up its use. It allows you to separate the definition of an application from the configuration.

There are a couple of benefits that are easy to see: First, you can have different ConfigMaps in different namespaces, so if you have different environments (dev/test/prod) in different namespaces, you can have a configuration for each. It also means if you have similar configuration elements in separate microservices, you can share the ConfigMaps and make sure everyone is on the same configuration page.

To expose a service, you set its type to LoadBalancer. If your system has a load balancer provider (which ours does now), it’ll give the service a permanent IP address. Run kubectl edit service/hellok8s-service, edit the spec.type from ClusterIP to LoadBalancer, and then look at the output of the kubectl get services command:

rkamradt@beast:~$ kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
hellok8s-service LoadBalancer 10.99.129.137 192.168.122.240 8080:30210/TCP 10h
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 5d15h

As you can see, the hellok8s-service now has an external IP, 192.168.122.240 (yours will be different). Hit it up with curl:

rkamradt@beast:~$ curl http://192.168.122.240:8080/
Hello World

Boom! Now you’ve configured a load balancer and up to 11 of your services can be automatically exposed. I’d make a joke about up to 11, but it’s beneath me.

There’s still a problem: Only our main host can see the bridge it hosts. If we want the main network to see our service, we’d need to set up a route.

My home network router is pretty clunky, and trying to make it do anything fancy can be an exercise in futility. Maybe you’ll have better luck with your router, but there’s another option. I set up an nginx reverse proxy on my main host to make it act like an edge server. So now I can proxy HTTP from the main network to the main host, which sees the bridge network. Again, see my article “Playing With VMs and Kubernetes” for the details on that.

What else is missing from our system? So far we’ve only exposed dummy services that just print out the same thing. Pretty boring. We can make things more exciting by adding a database into the mix. But databases need to store stuff — and they need to do it out of the box, Kubernetes doesn’t come with reliable persistent storage. We need to provide one. And, of course, there are many providers to choose from.

If you’re running on a cloud provider, they’ll have persistent storage built-in. But I’m running on a box in the corner of the living room, so I’ll have to find a storage provider and install it, similar to how I installed the load balancer.

In a previous article, “Build Your Own In-Home Cloud Storage,” I set up a box with GlusterFS. GlusterFS is one of the storage providers Kubernetes works with. To be able to mount GlusterFS in your nodes, you’ll have to ssh to each node and install the GlusterFS client: sudo apt-get install glusterfs-client.

Then we need to set up an Endpoints resource and hook it up to a headless service. Here is what the Endpoints resource looks like for my setup:

apiVersion: v1
kind: Endpoints
metadata:
name: glusterfs-cluster
subsets:
- addresses:
- ip: 192.168.0.104
ports:
- port: 1

My storage host is at 192.168.0.104, and the port can be any number from 1-64,000 (it has to be a legal value, but it’s not used except to match up with the service defined below). I only have a single node cluster, so it won’t be resilient or distributed — but it’ll work for what I need. Next is the headless service, which is defined like this:

apiVersion: v1
kind: Service
metadata:
name: glusterfs-cluster
spec:
ports:
- port: 1

The port will match it to the endpoints. Create these two files as endpoints.yaml and service.yaml, and then apply them:

kubectl apply -f endpoints.yaml
kubectl apply -f service.yaml

If you do a kubectl get all here, you’ll see a new service. To see the endpoints, you can run kubectl get ep and kubectl decribe ep glusterfs-cluster.

Now update the test.yaml you created earlier, and put this volume in the pod spec:

apiVersion: apps/v1
kind: Deployment
metadata:
name: hellok8s-deployment
labels:
app: hellok8s
spec:
selector:
matchLabels:
app: hellok8s
template:
metadata:
labels:
app: hellok8s
spec:
containers:
- name: hellok8s
image: docker.io/rlkamradt/hellok8s:latest
ports:
- containerPort: 8080
volumeMounts:
- mountPath: "/mnt/glusterfs"
name: glusterfsvol
volumes:
- name: glusterfsvol
glusterfs:
endpoints: glusterfs-cluster
path: /gv0
readOnly: true
---
apiVersion: v1
kind: Service
metadata:
name: hellok8s-service
spec:
type: ClusterIP
selector:
app: hellok8s
ports:
- port: 8080
targetPort: 8080

The glusterfs: section indicates to Kubernetes to use GlusterFS and provides the configuration parameters. The path: /gv0 is the volume we created in the previous article. Now remove and recreate the deployment:

kubectl delete -f test.yaml
kubectl apply -f test.yaml

Now we can get inside the pod and see if the /mnt/glusterfs is there. But wait, there’s a problem. The pod isn’t starting up — it’s staying in ContainerCreating status — so something’s not right. Let’s do a little troubleshooting. Running kubectl describe pod <podname> will dump out the trouble mounting the volume.

Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/6aa61d4b-0673-4b77-a2b6-9e8c990986e9/volumes/kubernetes.io~glusterfs/glusterfsvol --scope -- mount -t glusterfs -o auto_unmount,log-file=/var/lib/kubelet/plugins/kubernetes.io/glusterfs/glusterfsvol/hellok8s-deployment-666fcddc56-lsllq-glusterfs.log,log-level=ERROR,ro 192.168.0.104:/gv0 /var/lib/kubelet/pods/6aa61d4b-0673-4b77-a2b6-9e8c990986e9/volumes/kubernetes.io~glusterfs/glusterfsvol
Output: Running scope as unit: run-r20968ad0da8d4b5380bebfaaba201e23.scope
Mount failed. Please check the log file for more details.
, the following error information was pulled from the glusterfs log to help diagnose this issue:
[2020-04-21 22:54:12.196614] E [fuse-bridge.c:900:fuse_getattr_resume] 0-glusterfs-fuse: 3: GETATTR 1 (00000000-0000-0000-0000-000000000001) resolution failed
The message "E [MSGID: 101046] [dht-common.c:1501:dht_lookup_dir_cbk] 0-gv0-dht: dict is null" repeated 2 times between [2020-04-21 22:54:12.178581] and [2020-04-21 22:54:12.196594]
Warning FailedMount 20s kubelet, kube3 MountVolume.SetUp failed for volume "glusterfsvol" : mount failed: mount failed: exit status 1

The basic problem is resolution failed. But what does that mean? An extensive Google search on the entire error message turned up nothing, no matter how much or little of the error message I provided. I tried a few things such as installing the Gluster client package on each of the nodes. But nothing worked.

So I looked closer at the mount command and found log-level=ERROR. I figured if I could change that to log-level=DEBUG, I could have a better idea of what was wrong.

After a little more digging, I found out you can specify mountOptions in the specification of a volume. But the elitists at Kubernetes decided not to allow mountOptions in in-line volume definitions.

So we need to be more precise in our definitions. We need to create a PersistentVolumeand a PersistentVolumeClaim and then attach the PersistenVolumeClaim to the pod spec. I suppose it’d be proper to be more definitive, although for the purposes of brevity for this article, I was hoping to keep it simple with an in-line volume definition.

Let’s create a PersistentVolume and PersistentVolumeClaim. First, create a file calledpv.yaml, and add the following:

apiVersion: v1
kind: PersistentVolume
metadata:
name: gluster-pv
labels:
pv: gluster-pv
spec:
capacity:
storage: 5Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Recycle
mountOptions:
- log-level=DEBUG
glusterfs:
path: /gv0
endpoints: glusterfs-cluster

Then create a file called pvc.yaml, and enter the following:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: gluster-claim
spec:
accessModes:
- ReadWriteOnce
volumeMode: Filesystem
resources:
requests:
storage: 5Gi

These are more explicit definitions of what’s available and what’s needed. The PersistentVolume says that 5 GB is available from GlusterFS, and it provides the path and parameters.

It also allows us to define the mountOptions so we can create a more verbose log. The PersistentVolumeClaim says we’ll want, for some purpose, 5 GB of storage. Because we only have one persistent volume, it should always match that one. Now update the volume in the test.yaml file.

apiVersion: apps/v1
kind: Deployment
metadata:
name: hellok8s-deployment
labels:
app: hellok8s
spec:
selector:
matchLabels:
app: hellok8s
template:
metadata:
labels:
app: hellok8s
spec:
containers:
- name: hellok8s
image: docker.io/rlkamradt/hellok8s:latest
ports:
- containerPort: 8080
volumeMounts:
- mountPath: "/mnt/glusterfs"
name: glusterfsvol
volumes:
- name: glusterfsvol
persistentVolumeClaim:
claimName: gluster-claim
---
apiVersion: v1
kind: Service
metadata:
name: hellok8s-service
spec:
type: ClusterIP
selector:
app: hellok8s
ports:
- port: 8080
targetPort: 8080

Now apply everything:

kubectl apply -f pv.yaml
kubectl apply -f pvc.yaml
kubectl delete -f test.yaml
kubectl apply -f test.yaml

Of course, it still doesn’t work, but now we can SSH into the node and see the GlusterFS log. To find out the specifics, describe the pod kubectl describe pod <podname>. You should find a line similar to this one:

Normal   Scheduled         32s                   default-scheduler  Successfully assigned default/hellok8s-deployment-7667684f95-tfx9k to kube3
Warning FailedMount 32s kubelet, kube3 MountVolume.SetUp failed for volume "gluster-pv" : mount failed: mount failed: exit status 1
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/2a54c709-efea-4af6-8aa8-8a4634f980ae/volumes/kubernetes.io~glusterfs/gluster-pv --scope -- mount -t glusterfs -o auto_unmount,log-file=/var/lib/kubelet/plugins/kubernetes.io/glusterfs/gluster-pv/hellok8s-deployment-7667684f95-tfx9k-glusterfs.log,log-level=DEBUG 192.168.0.104:/gv0 /var/lib/kubelet/pods/2a54c709-efea-4af6-8aa8-8a4634f980ae/volumes/kubernetes.io~glusterfs/gluster-pv

We can see it deployed to the kube3 node and gives the log-file path. So SSH to the node, and cat the log. We should see a lot more log lines, which will give us some clues as to what went wrong.

Eventually, I found the line DNS resolution failed on host artful. I found this odd, as in the Kubernetes descriptions I’ve only been using the IP address — how did it know the name artful?

The answer is GlusterFS sends metadata back to the client about the volume that uses the hostname, not the IP address. To test this out, I edited the /etc/hosts on kube3 and added a line for artful. After a few minutes, Kubernetes automatically tried again to mount the volume, and it worked!

Now we can take a look inside the pod with kubectl exec -it <podname> -- /bin/sh, which will bring up a shell running inside the pod. You can then ls /mnt/glusterfs and see the files that were created when we were testing out the storage in “Build Your Own In-Home Cloud Storage.”

That’s a little sample of what you have to do to troubleshoot Kubernetes systems. Being able to ssh into your nodes, bring up a shell inside your pods, and figure out problems with the describe command when your pod is stuck in nonready states are your tools for finding out what’s wrong.

Of course, editing /etc/hosts on all your nodes isn’t the right answer — there’s probably a better way. Perhaps in the configuration of the GlusterFS, we should only use IP addresses. But I’m going to just edit /etc/hosts on all the nodes because I only have three worker nodes (pods aren’t normally run on the master node), and I’m not planning on adding any more. Once I do that, we can see if we can find a purpose for our solution (think database).

We’ll make a MongoDB database — who doesn’t love Mongo? This will allow us to be introduced to a few features I haven’t gone over before. Mongo will run in pods created by a StatefulSet, which are like replica sets, except they have very specific pod-creation restrictions.

First of all, a StatefulSet names pods with sequential numbers instead of the random numbers that you normally see for the last set of digits. It also starts them up one at a time so pods aren’t stepping on one another as they start up. For our purposes, none of that’ll matter, though, because I’m just going to start up one pod. You can try scaling up to see what happens, but to keep things simple at first, I’m sticking with the single pod.

Another thing I’m going to do is add a username/password secret so we don’t have to have that explicitly set in description files. So let’s start with that. There are a lot of ways to create Kubernetes secrets, but the easiest way is using literal values on the command line:

kubectl create secret generic mongo-secret \
--from-literal=username=mongo \
--from-literal=password=ognom

This isn’t exactly super-secret, but in a real environment, you’d probably get the values from a file or somewhere else. Since Kubernetes 1.14, you could use Kustomize to generate random passwords to be placed into secrets. But this is just a demonstration of how to use the secrets once they’re created. Once created, you can use kubectl describe secret mongo-secret to see information about the secret. Obviously the actual values are hidden.

Now we can create our StatefulSet. Create a file called mongo.yaml, and enter the following:

apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mongodb
spec:
serviceName: database
replicas: 1
selector:
matchLabels:
app: database
template:
metadata:
labels:
app: database
selector: mongodb
spec:
containers:
- name: mongodb
image: mongo:4.0.8
env:
- name: MONGO_INITDB_ROOT_USERNAME
valueFrom:
secretKeyRef:
name: mongo-secret
key: username
- name: MONGO_INITDB_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: mongo-secret
key: password
volumeMounts:
- name: mongodb-data
mountPath: /data/db
volumes:
- name: mongodb-data
persistentVolumeClaim:
claimName: gluster-claim

Now run kubectl apply -f mongo.yaml, and wait for it to start up. Eventually, there should be a pod called mongodb-0, and we can start up a shell inside it with kubectl exec -it mongodb-0 — /bin/sh. From there, you can access the MongoDB shell: mongo mongodb://localhost:27017. Bam! You’re in.

You’ll have to start up a service to access the database from other pods. Maybe in my next article, I’ll do that and write up a little REST service that accesses the database. For now, if you want to prove to yourself it actually worked, you can go back to your storage host (artful, in my case) and take a look in the directory that we set up as a brick. There should be a bunch of database files in it now.

I hope you’ve had as much fun as I’ve had getting this all working, including a little troubleshooting. There have been times when it’s been frustrating, but that just increases the sense of accomplishment when things actually work. All of the scripts for this article can be found here in their final form.

Better Programming

Advice for programmers.

Thanks to Zack Shapiro

Randal Kamradt Sr

Written by

Randal has been a software dev for thirty years but still manages to keep on top of the latest tech trends. https://rkamradt.github.io/

Better Programming

Advice for programmers.

Randal Kamradt Sr

Written by

Randal has been a software dev for thirty years but still manages to keep on top of the latest tech trends. https://rkamradt.github.io/

Better Programming

Advice for programmers.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store