Setting Up an On-premise Kubernetes Cluster from Scratch

10 min readOct 19, 2021

Want to set up a Kubernetes environment to start learning how to deploy/manage applications, but not too willing to pay cloud providers for various reasons(eg. much procrastination during the learning journey, company does not set aside budget for cloud-based solutions)? This post walks you through setting up a bare minimal, yet scalable Kubernetes cluster, using PC/Server/VM. All you need for each machine(a.k.a node for the rest of this post) is to fulfill the system requirements.

In this post, we will use kubeadm to bootstrap the cluster, so be sure that each node fulfills the requirements of kubeadm as well.

Machine(s) Setup

Machine Type: Virtualbox VMs(this should be inconsequential, as long as hardware requirements are met)
Operating System: Ubuntu Server 20.04
Network: Virtualbox Host-Only network(use your own office private network address space if you have) with separate NAT adapter for installing packages
Cluster Nodes: 1 Kubernetes API load balancer(192.168.56.9/24, with virtual address 192.168.56.11/24), 1 control plane node(192.168.56.12/24), 2 worker nodes(192.168.56.14/24 and 192.168.56.15/24) that run actual application pods/containers
For master and worker nodes, it is recommended to have dedicated hard disk(500GB is good enough) to be used as persistent data for Containerd, in case your root filesystem partition is too small that it eventually gets full and the whole system becomes unable to operate.

Why 2 worker nodes? This is to test that Linux/network firewall does not get in the way of 2 pods that are supposed to communicate with each other, but are scheduled onto 2 different nodes, from doing so.

Basic Setup

Do these for each node.

Set up SSH access(will be MUCH easier if you can copy and paste stuffs via SSH clients like Putty)
Set the correct timezone for your region
Set up NTP to sync time across all machines
Set up restrictive firewall access, allowing only ports used by Kubernetes and other services installed in this post

Install Kubeadm on each node

Refer to kubeadm installation guide. This section will only cover specific things for some special use cases.

We will be using Containerd as the container runtime, as Docker has been announced to be no longer supported by Kubernetes in the future. Containerd is a pretty lightweight runtime, running underneath Docker for typical Docker installations. Use this installation guide to set up Containerd.

Do take note to use the installation steps from official Docker installation guide, except that you do not install Docker-related packages, only containerd.io.

Note that it is recommended to set systemd as the Cgroup driver.

Containerd may take up a lot of disk space when running many containers, especially if not using dedicated external volume provider. You may want to add a dedicated hard-disk of sizeable storage capacity to store Containerd’s data. In /etc/containerd/config.toml, change the root value to the path where the new hard-disk is mounted and restart Containerd.

If running behind corporate proxy, create the file /etc/systemd/system/containerd.service.d/http-proxy.conf with the following contents.

[Service]
Environment=”HTTP_PROXY=<proxy_url>”
Environment=”HTTPS_PROXY=<proxy_url>”
Environment=”NO_PROXY=<list_of_addresses_and_hosts>”

(Optional) Setting the Internal IP Address of Each Node’s Kubelet

Kubelet by default uses the IP address of the network interface with the default gateway. If your cluster nodes have 2 network interfaces, 1 for general internet access(for installing packages etc.), 1 for cluster-internal network, you will want your control-plane nodes to access worker nodes via the private network IP address. Otherwise, depending on your network level firewall policies, you may have trouble getting pod logs via kubectl command.

Open /etc/systemd/system/kubelet.service.d/10-kubeadm.conf (actual file path may differ for non-Ubuntu systems), append the line with Environment=”KUBELET_CONFIG_ARGS= — config=/var/lib/kubelet/config.yaml” into something like this.

Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml --node-ip=192.168.56.14"

Remember to set the node IP to the correct IP address of the node in question.

Restart the Kubelet to apply the settings.

systemctl daemon-reload
systemctl restart kubelet

Credits to this post.

Setting Up the Kubernetes API Load Balancer

This section mainly takes from official documentation for setting up highly-available cluster. The points here alter some of the steps to fit our use case or make the setup more visible for our understanding.

For high-availability load-balancing of Kubernetes API, we will be using Keepalived and HAProxy. Follow this setup guide on the load balancer node.
Note that the ${APISERVER_VIP} is 192.168.56.11 and ${APISERVER_DEST_PORT} is 6443 . Change the port number if you decide to expose Kubernetes API via a different port number.
For HAProxy’s ${HOST1_ADDRESS} , use control plane node’s IP address 192.168.56.12 . Repeat this accordingly if you have more than 1 control plane node.

Setting Up the Kubernetes Control Plane

On the control plane node(only 1 of the nodes if having multiple control plane nodes), run the following command to generate the configuration file for initializing the Kubernetes cluster with Kubeadm.

kubeadm config print init-defaults --component-configs=KubeletConfiguration > init.yaml

Open and edit init.yaml , filling in the showcased fields. ... is used to mask out fields that are not of concern here.

apiVersion: kubeadm.k8s.io/v1beta3
bootstrapTokens:
- groups:
  - system:bootstrappers:kubeadm:default-node-token
  token: abcdef.0123456789abcdef
  ttl: 24h0m0s
  usages:
  - signing
  - authentication
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: 192.168.56.12
  bindPort: 6443
nodeRegistration:
  criSocket: /run/containerd/containerd.sock
  imagePullPolicy: IfNotPresent
  name: <whatever_identifier_you_want_for_this_node>
  taints:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
---
apiServer:
  timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta3
certificatesDir: /etc/kubernetes/pki
clusterName: <any_cluster_name>
controllerManager: {}
controlPlaneEndpoint: "192.168.56.11:6443"
dns:
  type: CoreDNS
etcd:
  local:
    dataDir: /var/lib/etcd
imageRepository: k8s.gcr.io
kind: ClusterConfiguration
kubernetesVersion: 1.22.0
networking:
  dnsDomain: cluster.local
  podSubnet: <reasonably_sized_private_subnet>
  serviceSubnet: 10.96.0.0/12
scheduler: {}
---
apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
  anonymous:
    enabled: false
  webhook:
    cacheTTL: 0s
    enabled: true
  x509:
    clientCAFile: /etc/kubernetes/pki/ca.crt
authorization:
  mode: Webhook
  webhook:
    cacheAuthorizedTTL: 0s
    cacheUnauthorizedTTL: 0s
cgroupDriver: systemd
clusterDNS:
- 10.96.0.10
clusterDomain: <same_cluster_name_as_cluster_configuration>
...

Some notes:

advertiseAddress is that control plane node’s own IP address.
controlPlaneEndpoint is the API server load balancer’s virtual IP address in a multi-node controle plane setup. It is the address that Kubernetes clients use to communicate with the API server.
clusterDomain matter when your containerized application has to connect to another containerized component.
clusterDNS and serviceSubnet can be changed if you understand this, I don’t really understand how this works.

More details about the fields of the objects involved can be found here. Refer to the latest API version documentation in case this post becomes old at the time of reading.

Finish up the setup of the 1st control plane node. It will take some time to complete. If there is any error causing failure, look out for the displayed error messages, or grab them from Kubelet’s logs(via systemctl or journalctl ) and try your luck on Google. The Containerd proxy settings was one such issue I encountered.

sudo kubeadm init --config init.yaml --upload-certs

Once the command succeeds, you will see final output with several commands. These are pretty important when adding subsequent nodes(using default values if config file is not specified), so do save them in a file for contingency purposes.

Setting up the Worker Nodes

Similar to control plane node, generate the join configuration file for better control over some of the minute details.

kubeadm config print join-defaults --component-configs=KubeletConfiguration > join.yaml

Edit the generated YAML file into something like this, based on our pre-defined setup context.

apiVersion: kubeadm.k8s.io/v1beta3
caCertPath: /etc/kubernetes/pki/ca.crt
discovery:
  bootstrapToken:
    apiServerEndpoint: 192.168.60.11:6443
    caCertHashes:
    - sha256:<cluster_ca_cert_sha256_hash>
    token: <same_token_as_init.yaml>
    unsafeSkipCAVerification: true
  timeout: 5m0s
  tlsBootstrapToken: <same_token_as_init.yaml>
kind: JoinConfiguration
nodeRegistration:
  criSocket: /run/containerd/containerd.sock
  imagePullPolicy: IfNotPresent
  name: <identifier_for_this_worker_node>
  taints: null
---
apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
  anonymous:
    enabled: false
  webhook:
    cacheTTL: 0s
    enabled: true
  x509:
    clientCAFile: /etc/kubernetes/pki/ca.crt
authorization:
  mode: Webhook
  webhook:
    cacheAuthorizedTTL: 0s
    cacheUnauthorizedTTL: 0s
cgroupDriver: systemd
clusterDNS:
- <same_as_init.yaml>
clusterDomain: <same_as_init.yaml>
...

To get the SHA256 hash of the cluster’s CA certificate, run this command on the control plane node. Information about this command can be found here.

openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'

In the above snippet for join.yaml, the caCertHash value looks something like sha256:xxxx , where xxxx is the SHA256 hash of the CA certificate generated from the above command.

The control plane’s discovery token lasts for 24 hours. If adding node to the cluster after 24 hours have passed, you will need to re-generate the discovery token on the control plane node. More information about tokens can be found here.

kubeadm token create --config init.yaml

This generates the same token defined in init.yaml .

You can then grab the token with this command.

kubeadm token list

Paste this token into bootstrapToken section, as well as tlsBootstrapToken .

Join the node to the cluster.

sudo kubeadm join --config join.yaml

(Optional) Set up the Other Control Plane Node(s)

Similar to adding worker nodes, generate a join configuration.

kubeadm config print join-defaults --component-configs=KubeletConfiguration > join.yaml

For the configuration file, set things up in the same way as worker nodes, except that in JoinConfiguration object, add the following field.

controlPlane:
  localApiEndpoint:
    advertiseAddress: 192.168.56.13
    bindPort: 6443
  certificateKey: <control_plane_certificate_key>

For the advertise address, use the IP address of the new control plane node to be added, ensuring that this address is in the control plane network, accessible by the HAProxy load balancer.

If you saved the output of kubeadm init into a file, you can easily get the certificate key value to put into the above snippet. Else, re-upload and get new decryption key using this command on the initialized control plane node.

sudo kubeadm --config init.yaml init phase upload-certs --upload-certs

This will output the certificate key string on the final output line.

Based on this article, it seems like the certificate key secret object expires after 2 hours, so you may want to save this string to a file and test again after 2 hours to verify this.

Once done with the configuration file, join the node to the cluster’s control plane.

sudo kubeadm --config=join.yaml join

On success, verify that the new control plane node is in by running this on the first control plane node.

kubectl get nodes

Set up Network Plugin

Kubernetes cluster requires a network plugin for pods to communicate with each other. Here, we will be using Calico network plugin.

While looking up on how to upgrade Calico, I found out there are 2 ways of deploying Calico: using manifest file, or using Calico operator(operator is actually called Tigera operator). Note that for the Calico operator method, ignore the step for untainting control-plane node as that is for a 1-node Kubernetes cluster.

In this post, we’ll use the manifest installation, but with extra configuration settings to make it easier to transit into using operator if you wish to.

On the control plane node, download the Calico manifest file here.

In the manifest file, uncomment the environment variable CALICO_IPV4POOL_CIDR and set it to some private IP address subnet that does not clash with the Kubernetes nodes’ subnet to play safe.

Also, in the manifest file, navigate to the ConfigMap calico-config , under cni_network_config ‘s plugins.ipam , configure it to look like this.

"ipam": {
    "type": "calico-ipam",
    "assign_ipv4": "true",
    "ipv4_pools": ["192.168.72.0/21"]
}

For ipv4_pools , set it to be the same as CALICO_IPV4POOL_CIDR that was set in the previous step.

Calico uses TCP port 179 for BGP to allow pods on different nodes to communicate with each other. If firewall is enabled on the nodes, ensure bidirectional traffic to/from TCP port 179 is allowed. If Calico is customized for something other than BGP, check out this network requirements page for more details on the exact port to unblock.

Once done customizing(make sure you know what you are changing or are following some setup guide), set up the network plugin.

kubectl apply -f calico.yaml

Once that is done, it will take a few minutes for the nodes to enter Ready status.

Set Up the Load Balancer

Next, we will need to set up a load balancer that can handle all incoming traffic to the application pods/containers that will be deployed on the Kubernetes cluster. On public cloud providers, this will automatically be provisioned when you define an Ingress resource. For this setup, MetalLB will be used as the load balancer of choice.

First, unblock port 7946 for both TCP and UDP, as stated in MetalLB’s system requirements.

Create the namespace needed to deploy MetalLB.

kubectl create namespace metallb-system

On the control plane node, download the manifest file. In the manifest file, before the DaemonSet and the Deployment objects at the bottom of the file, add the following ConfigMap object.

apiVersion: v1
kind: ConfigMap
metadata:
  name: config
  namespace: metallb-system
data:
  config: |
    address-pools:
    - name: default
      protocol: layer2
      addresses:
      - 192.168.56.24/29
      - 192.168.56.34-192.168.56.35
    - name: no-auto-assign
      protocol: layer2
      auto-assign: false
      addresses:
      - 192.168.56.33/32

You can name the address pools any way you like, and have any number of address pools for easier segregation and management. Feel free to set the address pool range to anything that suits your use case. If auto-assign is set to false, the address can only be assigned by explicitly requesting for it via LoadBalancer service objects.

Once done, deploy the MetalLB load balancer.

kubectl apply -f metallb.yaml

On success, the controller and speaker pods should be up and running.

Set Up Ingress Controller

For application pods to receive requests from outside the cluster, ingress is needed. Ingress controllers take incoming (HTTP/HTTPS) requests and pass it on to the correct pod via backend service. For this setup, Nginx Ingress Controller is used as it is relatively easy to setup in my opinion.

On the control plane node, download the manifest file here, saving it as nginx-ingress-controller.yaml. Here, the GKE version of the manifest is used as it seemed the closest to this setup configuration to me. We need a LoadBalancer resource that receives an IP address from MetalLB in order for the ingress controller to serve requests coming from outside.

Create the ingress controller and all its relevant stuff.

kubectl apply -f nginx-ingress-controller.yaml

Testing the Setup

Finally, we will need a simple web application deployed on the cluster to make sure that the cluster setup is complete and will serve all our deployed applications without any problems.

Create a sample application manifest test.yaml with the following content.

apiVersion: v1
kind: Service
metadata:
  name: test-app
spec:
  ports:
  - name: http
    port: 80
    targetPort: http
  selector:
    name: test-app
    app: phpmyadmin
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: test-app
  annotations:
    kubernetes.io/ingress.class: "nginx"
    nginx.ingress.kubernetes.io/rewrite-target: "/"
spec:
  rules:
  - http:
     paths:
     - path: /
       pathType: ImplementationSpecific
       backend:
         service:
           name: test-app
           port:
             name: http
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: test-app
spec:
  selector:
    matchLabels:
      name: test-app
      app: phpmyadmin
  template:
    metadata:
      labels:
        name: test-app
        app: phpmyadmin
    spec:
      containers:
      - name: phpmyadmin
        image: phpmyadmin/phpmyadmin:5.1.1
        env:
        - name: PMA_ARBITRARY
          value: "1"
        ports:
        - name: http
          containerPort: 80
---

This defines a basic PHPMyAdmin web application. Deploy it with this command.

kubectl apply -f test.yaml

You can get the IP address at which to access the application by checking the IP address assigned to the Ingress object.

kubectl get ingress

Suppose the IP address 192.169.56.24 is assigned to the Ingress object. You can then access the application at http://192.168.56.24 .

Note: Last time, there used to be some SSL-related issue involving Nginx Ingress Controller’s admission validating webhook, such that the fix was to delete the ValidatingWebhookConfiguration object for Nginx Ingress Controller. At the point of writing this section, it seems there isn’t a need for this anymore. Do keep a lookout just in case.

Wrapping Up

I hope this post covers all the aspects of setting up a cluster from scratch, be it doing it for the first time or adding new nodes to the cluster after a few months or years. Cluster upgrade, user/cluster certificate renewal, service accounts, RBAC etc. are not covered here as I have yet to touch on those myself. Do leave a positive comment and help to promote this post if it was helpful to you.