Kubernetes with Data Engineering approach — (1) Create a Kubernetes cluster(with all its dependencies)

AHMET FURKAN DEMIR
27 min readDec 25, 2023

--

Guide to Creating a Perfect Kubernetes Cluster with Dependencies

Hello, in this exciting new series, we aim to create impressive architectures and projects by combining the powerful worlds of Data Engineering and Kubernetes. We take our first step with this article titled “1 Create a Kubernetes cluster(with all its dependencies)”.

Today, in the rapidly changing technology landscape, data processing and management is becoming increasingly complex. In this case, Kubernetes’ flexible, scalable, and distributed systems management capabilities are critical for Data Engineers. In this article, you’ll explore the basics of building a Kubernetes cluster with our step-by-step guide from getting started.

Why Kubernetes? Because Kubernetes offers an open source platform that makes it easy to deploy, scale and manage container-based applications. Whether you’re working with microservices architectures or big data processing projects, Kubernetes may be the ideal solution for you.

In this article, we will cover in detail all the dependencies required to create a Kubernetes cluster. Thanks to our step-by-step guide, you will be able to simplify this complex process and quickly implement your projects.

Are you ready? Then, let’s start our Kubernetes journey together!

Our Roadmap:

  1. What is Kubernetes — (Reading)
  2. Cluster configuration
  3. Creating a Kubernetes cluster with Kubeadm
  4. Kubernetes building blocks — (Reading)
  5. Kubeadm, Kubectl and Kubelet commands — (Reading)
  6. Kubernetes Dashboard
  7. Kubernetes Storage configuration
  8. Helm and launching a sample application

What is Kubernetes

https://kubernetes.io/

Kubernetes is an open source container orchestration platform. Developed by Google and supported by the Cloud Native Computing Foundation (CNCF), it is designed to automatically deploy, scale and manage container-based applications. Containers are portable, isolated units that contain all the dependencies of an application. By bringing these containers together, Kubernetes ensures that applications run smoothly.

Key Features of Kubernetes

  1. Automatic Deployment: Kubernetes can automatically deploy applications according to their specified status. This makes it easier to do things like releasing a new version or scaling when demands increase.
  2. Scalability: To quickly respond to the increasing demands of applications, Kubernetes can be scaled dynamically. In this way, it is possible to optimize system performance.
  3. Service Discovery & Load Balancing: Kubernetes facilitates communication between services and shares the load by automatically balancing demands.
  4. Self-Healing: If a container or a node fails, Kubernetes automatically detects this situation and moves the affected applications to another healthy node, allowing the system to continue without interruption.
  5. Security and Isolation: To provide isolation between containers, Kubernetes enforces predefined security policies and manages resources effectively.

Kubernetes Usage Scenarios

  1. Microservices Architectures: Kubernetes supports microservices architectures, making complex applications modular and scalable independently.
  2. DevOps Processes: With automated deployment, testing, and integration processes, Kubernetes optimizes DevOps processes by unifying software development and operations.
  3. Big Data Processing: By integrating with big data processing systems, Kubernetes can process and store data effectively.

In conclusion

Kubernetes is a revolutionary tool in modern application development and deployment processes. This platform, which facilitates the work of both developers and operations teams, will continue to have an important role in the technological landscape of the future. In this series of articles, we will focus on understanding the core concepts of Kubernetes, implementing them, and building powerful projects. Are you ready for this exciting journey?

Remember, Kubernetes is not just a tool, it’s a transformation tool. We will take steps together in this series to discover and implement this transformation.

Cluster configuration

In this section, we will configure port communication between Ubuntu machines and the necessary settings for Kubernetes.

Machine configuration with Multipass

First, let’s open our Ubuntu machines to one Master and two Nodes using Multipass.

If you wish, you can use VirtualBox, virtual machines from any cloud provider or direct physical servers instead of Multipass.

Use the following commands to create virtual machines using Multipass.

multipass launch --name master --cpus 4 --memory 5120M --disk 40G
multipass launch --name node1 --cpus 4 --memory 5120M --disk 40G
multipass launch --name node2 --cpus 4 --memory 5120M --disk 40G

# list your machines
multipass list

# master open
multipass shell master
sudo apt-get update
sudo apt-get upgrade
sudo apt-get install net-tools
sudo apt-get install neofetch

# node1 open
multipass shell node1
sudo apt-get update
sudo apt-get upgrade
sudo apt-get install net-tools
sudo apt-get install neofetch

# node2 open
multipass shell node2
sudo apt-get update
sudo apt-get upgrade
sudo apt-get install net-tools
sudo apt-get install neofetch
List of ubuntu machines opened on Multipass

Machines

  • master (172.20.82.86) | Ubuntu 22.04 LTS | 4 CPU, 5 GB RAM, 40 GB SSD
  • node1 (172.20.91.222) | Ubuntu 22.04 LTS | 4 CPU, 5 GB RAM, 40 GB SSD
  • node2 (172.20.87.1) | Ubuntu 22.04 LTS | 4 CPU, 5 GB RAM, 40 GB SSD
Features of master machine

After creating the machines, we will make the necessary configurations on the machines for Kubernetes Cluster installation.

Port configurations

First of all, we need to open the necessary ports for the communication of the machines in the Cluster. You can open the necessary ports via ufw with the following commands.

ufw (Uncomplicated Firewall) is a firewall management tool used on Ubuntu. Essentially, it is designed to simplify and make firewall settings more user-friendly. It is widely used in Ubuntu and Debian-based distributions.

You can open the necessary ports with the following commands:

Run for all machines.

sudo ufw allow 22
sudo ufw allow ssh

# API Server (kube-apiserver)
sudo ufw allow 6443

# etcd
sudo ufw allow 2379
sudo ufw allow 2380

# Kubelet API
sudo ufw allow 10250
sudo ufw allow 10248

# Kubelet cAdvisor (Container Advisor)
sudo ufw allow 10255

# Kube Proxy
sudo ufw allow 10256

# Controller Manager
sudo ufw allow 10252

# Scheduler
sudo ufw allow 10251
sudo ufw allow 10259
sudo ufw allow 10257

# DNS (CoreDNS)
sudo ufw allow 53/udp
sudo ufw allow 53/tcp

# Apache&Kubernetes Dashboard
sudo ufw allow 443
sudo ufw allow 8443
sudo ufw allow 80

# Calico (default network plugin)
sudo ufw allow 179
sudo ufw allow 5473


sudo ufw enable
sudo ufw reload
sudo ufw status

Hostname configurations

After opening the ports, we must assign hostnames separately for each machine.

# master
sudo hostnamectl set-hostname master

# node1
sudo hostnamectl set-hostname node1

# node2
sudo hostnamectl set-hostname node2

For example, when you run the sudo hostnamectl hostname command, you should get the following output. It may give a different output depending on each machine.

learning hostname

Activating kernel modules and closing swap areas

Run the commands for all machines.

sudo modprobe overlay
sudo modprobe br_netfilter
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
EOF
sudo sysctl --system
sudo swapoff -a
free -m
sudo sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab

Containerd install

Run the Containerd installation commands on all machines.

Containerd is an open source container runtime used to enable container-based application deployment and management. It was developed by Docker and later supported by the Open Container Initiative (OCI). Containerd provides basic container functionality such as creating, running, stopping, and managing containers.

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker.gpg
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

sudo apt update
sudo apt install containerd.io
sudo systemctl daemon-reload
sudo systemctl enable --now containerd
sudo systemctl start containerd
sudo mkdir -p /etc/containerd
sudo su -
containerd config default | tee /etc/containerd/config.toml
exit
sudo sed -i 's/ SystemdCgroup = false/ SystemdCgroup = true/' /etc/containerd/config.toml
sudo systemctl restart containerd

So we finished the Cluster configurations. Now we can start creating a Kubernetes cluster with Kubeadm.

Creating a Kubernetes cluster with Kubeadm

After completing all the necessary operations on the machines, it is time to create a Kubernetes cluster with Kubeadm. Let’s start.

https://kubernetes.io/docs/reference/setup-tools/kubeadm/

Kubeadm is a tool used to launch and manage a Kubernetes cluster. It is a command line tool designed to make Kubernetes easier to configure and launch manually. With kubeadm, users can create a Kubernetes cluster quickly and repeatably.

Here are the main features of Kubeadm:

  1. Cluster Launching and Management: kubeadm initializes and configures the key components (API server, etcd, control nodes, etc.) required to launch and manage a Kubernetes cluster.
  2. Configuration and Initial Setup: kubeadm allows users to generate the basic configuration files needed to quickly launch a Kubernetes cluster and perform the necessary installation steps.
  3. Reproducible Installations: kubeadm makes cluster installation repeatable. This makes it easy to create similar Kubernetes clusters in multiple environments using the same configuration.
  4. Cluster Expansion: Adding new nodes to an existing Kubernetes cluster is a relatively simple process with kubeadm. This is used to increase the capacity of the cluster and respond to demands.
  5. Node Removal: kubeadm can be used to remove a node from a Kubernetes cluster. This is useful when you want to remove a node from the cluster for maintenance or other reasons.

Using kubeadm generally involves the following steps:

  1. Kubernetes Cluster Startup: starting the first control node with the kubeadm init command.
  2. Retrieving Cluster Configuration: Applying commands from kubeadm init output to other control nodes.
  3. Joining Nodes to the Cluster: Joining other nodes to the cluster with the kubeadm join command.
  4. Adding a Network Analyzer (CNI): Enabling network communication between nodes by adding one of the various Container Network Interface (CNI) plug-ins.

Beyond kubeadm, there are several other open-source tools for creating and managing Kubernetes clusters. Here are some notable ones:

  1. kops (Kubernetes Operations): kops is a command-line utility that facilitates the creation, upgrade, and management of production-grade Kubernetes clusters on cloud platforms such as AWS. It automates the provisioning of resources like instances and storage, making it suitable for production environments.
  2. kubespray: kubespray is a Ansible-based tool for deploying production-ready Kubernetes clusters. It supports multiple cloud providers and on-premises deployments. With its modular design, users can customize their clusters based on specific requirements.
  3. Rancher: Rancher is an open-source Kubernetes management platform that provides a complete set of infrastructure services for containerized applications. It simplifies the deployment and management of Kubernetes clusters across different environments, offering features such as centralized authentication, monitoring, and application catalog.
  4. k3s: k3s is a lightweight and easy-to-install Kubernetes distribution designed for resource-constrained environments, edge computing, and IoT devices. It includes all the necessary components for a fully functional Kubernetes cluster but with a smaller footprint.
  5. kind (Kubernetes IN Docker): kind is a tool for running local Kubernetes clusters using Docker container "nodes." It allows developers to create isolated, multi-node Kubernetes clusters on their local machines for testing and development purposes.
  6. Minikube: Minikube is a tool that enables users to run a single-node Kubernetes cluster locally on their development machine. It is suitable for testing and learning Kubernetes in a small-scale environment.
  7. K3d (Kubernetes in Docker): Similar to kind, k3d is a lightweight tool that runs Kubernetes clusters inside Docker containers. It is designed for local development and testing, providing an easy way to spin up multiple clusters on the same machine.
  8. Terraform: While not specifically a Kubernetes cluster creation tool, Terraform is widely used for infrastructure as code (IaC) and can be employed to provision and manage Kubernetes clusters on various cloud providers. It offers a high level of customization and flexibility.

These tools cater to different use cases and preferences, providing options for creating and managing Kubernetes clusters in diverse environments and scenarios.

Let’s start creating the Kubernetes Cluster with Kubeadm.

installation of kubelet, kubeadm and kubectl. Run the commands below on all machines.

sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl
sudo curl -fsSLo /usr/share/keyrings/kubernetes-archive-keyring.gpg https://dl.k8s.io/apt/doc/apt-key.gpg
echo "deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl

Kubernetes stand-up command with Kubeadm on the master machine. Only run it on the Master machine.

sudo kubeadm config images pull
# sudo kubeadm init --pod-network-cidr=192.168.0.0/16 --apiserver-advertise-address=<your-master-ip> --control-plane-endpoint=<your-master-ip>
sudo kubeadm init --pod-network-cidr=192.168.0.0/16 --apiserver-advertise-address=172.20.82.86 --control-plane-endpoint=172.20.82.86

After all operations are completed successfully, you should get an output like the following.

Output of a successful Master installation

Copy and save the custom Join command created for you.

sudo kubeadm join 172.20.82.86:6443 --token as25d4.zu8s6ao2llm3ssot \
--discovery-token-ca-cert-hash sha256:19b26a187a41e61a3bcb67d2009ce4e27cc1b48a7b96edd93d042971afb0aa00

Then run the following commands on the Master machine.

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

After these operations, paste the join code that Kubeadm produces for you into all nodes one by one and run it. Thanks to this command, the nodes and the master will communicate and create a cluster.

Inclusion of any Node in the cluster

Take a look at your cluster with the kubectl get nodes command. Initially, the Status will appear as Not Ready, do not worry, it will return to Ready status after the configurations are completed.

kubectl get nodes

Kubernetes network policy

Calico is a network policy and security solution for Kubernetes and other container orchestration systems. Calico is designed to provide network segmentation, microservice communication, security policy enforcement and inter-pod communication. Additionally, Calico offers a comprehensive networking solution that can run across multiple cloud environments and physical machines.

Calico installation

kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.25.1/manifests/tigera-operator.yaml
kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.25.1/manifests/custom-resources.yaml

Run the commands and wait a bit for the cluster to be ready. Browse nodes in the Kubernetes cluster.

kubectl get nodes

After the command runs, make sure that the status of the nodes in the cluster is Ready and their versions are the same.

kubectl get nodes

The output of the kubectl get pods -n kube-system command should look like the following.

kubectl get pods -n kube-system

Let’s assign roles for the nodes. This step is optional, it doesn’t matter if you don’t assign a role, I’ll do it to make it look nice :)

kubectl label node node1 node-role.kubernetes.io/worker=worker
kubectl label node node2 node-role.kubernetes.io/worker=worker
Roles

Solution for the “ClusterInformation: connection is unauthorized: Unauthorized” error that we will encounter in the future. Since it is an error caused by Calico, we will solve it by pasting the codes below and you will never encounter this error.

kubectl set env daemonset/calico-node -n calico-system IP_AUTODETECTION_METHOD=interface=ens*
kubectl set env daemonset/calico-node -n calico-system IP=can-reach=autodetect
kubectl set env daemonset/calico-node -n calico-system IP_AUTODETECTION_METHOD=can-reach=www.google.com

And congratulations, you have successfully completed the cluster creation process. Now it’s time to move on to the next steps. Now all the steps in the rest of the article will be performed only in Master.

Kubernetes building blocks

In this step, we will examine the building blocks of Kubernetes. The information you will learn here will be needed in the future steps, so please do not pass without reading.

Workloads:

  • Pods: Pods are the smallest deployable units in Kubernetes, representing a single instance of a running process. They encapsulate one or more containers, sharing network and storage resources. Pods enable the co-location and communication of tightly coupled applications.
  • ReplicaSets: ReplicaSets ensure a specified number of pod replicas are running at all times. If a pod fails or is deleted, the ReplicaSet automatically replaces it to maintain the desired replica count, ensuring high availability.
  • Replication Controllers: Replication Controllers are the predecessors of ReplicaSets and provide a similar functionality, ensuring a specified number of pod replicas are running.
  • Deployments: Deployments offer declarative updates to applications, managing the deployment and scaling of pods. They enable rolling updates, rollbacks, and version management, enhancing application lifecycle management.
  • StatefulSets: StatefulSets are designed for stateful applications, providing stable network identities and persistent storage for each pod. They are suitable for applications that require unique hostnames and ordered deployment.
  • Jobs: Jobs manage the execution of short-lived tasks to completion. They are useful for batch processing, data migration, or any task that needs to run to completion but doesn’t require continuous execution.
  • DaemonSets: DaemonSets ensure that a copy of a pod runs on every node in the cluster. This is particularly useful for system-level tasks such as log collection, monitoring, or node-specific functionality.
  • Cron Jobs: Cron Jobs enable the scheduling of recurring tasks in a Kubernetes cluster, similar to the cron jobs in traditional Unix/Linux systems. They are valuable for automating periodic processes.

Services:

  • Services: Services expose a set of pods as a network service, providing a stable IP address and DNS name for clients to interact with. They facilitate load balancing and internal communication within the cluster.
  • Ingresses: Ingresses provide external access to services within the cluster. They manage external HTTP and HTTPS routing, allowing you to define rules for handling incoming traffic.

Config and Storage:

  • Config Maps: Config Maps store configuration data separately from application code, allowing for easier configuration changes without modifying the container image.
  • Persistent Volume (PV) and Persistent Volume Claim (PVC): PVs represent physical storage in the cluster, while PVCs are requests for storage by pods. PVCs abstract the underlying storage details from the pod, promoting flexibility and portability.
  • Secrets: Secrets store sensitive information such as passwords, API keys, and tokens. They are base64-encoded and can be mounted into pods or used as environment variables.

Cluster:

  • Namespaces: Namespaces provide a way to divide cluster resources into virtual clusters, enabling multi-tenancy and resource isolation. They help organize and manage objects within a cluster.
  • Nodes: Nodes are the individual machines (physical or virtual) that constitute a Kubernetes cluster. They run pods and provide the runtime environment for containerized applications.
  • Roles: Roles define a set of permissions within a namespace. They are used in conjunction with role bindings to grant access to resources within the cluster.

Conclusion:

Understanding these fundamental components of Kubernetes is crucial for effectively deploying, managing, and scaling containerized applications. Whether you are orchestrating complex workloads or configuring networking and storage, a solid grasp of these building blocks empowers you to harness the full potential of Kubernetes in your containerized environment.

Kubeadm, Kubectl and Kubelet commands

In this step, we will explore the columns of Kubernetes, that is, the tools that keep it up and that we will use constantly.

Kubeadm: Kubeadm is a tool that assists in the deployment and initialization of Kubernetes clusters. It simplifies the process of setting up a basic, secure Kubernetes cluster by handling tasks such as bootstrapping the control plane, configuring the networking, and joining worker nodes to the cluster. Kubeadm is designed to be a fast and easy way to get a Kubernetes cluster up and running.

Kubectl: Kubectl is the command-line interface (CLI) for interacting with Kubernetes clusters. It allows users to perform various operations on Kubernetes resources, such as deploying applications, inspecting and managing cluster resources, and troubleshooting issues. Kubectl communicates with the Kubernetes API server to execute these commands and obtain information about the cluster’s state.

Kubelet: Kubelet is an essential component running on each node in a Kubernetes cluster. It is responsible for managing containers on the node, ensuring that the containers described in the pod manifest are running and healthy. Kubelet communicates with the Kubernetes API server to receive instructions about the desired state of the containers and reports the current state of the containers back to the API server.

Now, let’s dive into some examples of commands for each:

Some of the commands may not mean anything to you for now, but as you dive deeper you will begin to understand what each command does and why it is used.

Kubeadm:

# Initialize a Kubernetes cluster (on the master node)
sudo kubeadm init

# Join a worker node to the cluster (output from 'kubeadm init' command on master)
sudo kubeadm join <master-node-ip>:<master-node-port> --token <token> --discovery-token-ca-cert-hash <hash>

# Reset a node (useful for starting over)
sudo kubeadm reset

Kubectl:

  • Deploy Yaml
# Deploy an application using a YAML manifest
kubectl apply -f <path-to-manifest.yaml>
  • Pod Operations:
# Get a list of pods in a namespace
kubectl get pods

# Describe a pod (provides detailed information about a pod)
kubectl describe pod <pod-name>

# Exec into a running pod (opens a shell inside the pod)
kubectl exec -it <pod-name> -- /bin/bash

# Port-forwarding to a pod (access a pod's container port locally)
kubectl port-forward <pod-name> <local-port>:<pod-port>
  • Service Operations:
# Get a list of services in a namespace
kubectl get services

# Describe a service
kubectl describe service <service-name>

# Expose a deployment as a service
kubectl expose deployment <deployment-name> --type=NodePort --port=<port>
  • Deployment and ReplicaSet:
# Get a list of deployments
kubectl get deployments

# Scale a deployment
kubectl scale deployment <deployment-name> --replicas=<desired-replica-count>

# Rollout status of a deployment
kubectl rollout status deployment <deployment-name>
  • Config Maps and Secrets:
# Create or update a ConfigMap from a file
kubectl create configmap <configmap-name> --from-file=<path-to-file>

# Create or update a Secret from literal values
kubectl create secret generic <secret-name> --from-literal=<key>=<value>
  • Namespace Operations:
# Get a list of namespaces
kubectl get namespaces

# Create a new namespace
kubectl create namespace <namespace-name>
  • Context and Configuration:
# Display the current context
kubectl config current-context

# Switch context to a different cluster
kubectl config use-context <context-name>

# View or edit the kubeconfig file
kubectl config view

Kubelet:

# Check the status of the Kubelet service
sudo systemctl status kubelet

# View logs for the Kubelet
journalctl -u kubelet

# Restart the Kubelet service
sudo systemctl restart kubelet

In the last two steps, we gained good and important information about Kubernetes. Now it’s time to go back to our Cluster again, there is still a lot of work to be done :)

Kubernetes Dashboard

Kubernetes Dashboard

We will install Kubernetes Dashboard in order to see our resources more closely and make the necessary configurations for our applications more easily.

Kubernetes Dashboard is a web-based user interface for managing and monitoring resources in a Kubernetes cluster. It serves as a graphical representation of the state of resources within the cluster, allowing users to interact with and manage various components such as nodes, running pods, services, and other Kubernetes entities. The dashboard provides a convenient way to visualize and control the deployment, scaling, and monitoring of applications deployed on a Kubernetes cluster through a web browser.

Kubernetes Dashboard installation. All operations will be done on the master machine.

# install dashboard
kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.7.0/aio/deploy/recommended.yaml
Install dashboard

Creating an admin role to access the Dashboard.

# crete admin
kubectl create clusterrolebinding dashaccess --clusterrole=cluster-admin --serviceaccount=kubernetes-dashboard:kubernetes-dashboard
Crete admin role

And creating tokens for Admin login. Keep the issued token and do not lose it.

kubectl -n kubernetes-dashboard create token kubernetes-dashboard --duration=999999h
Create token
# My token

eyJhbGciOiJSUzI1NiIsImtpZCI6IndQUFNNWnJWS1VXVmx3ZmVydm15MzF0UF9ZNWoxQWk5SWFBcXZpMEF3WDQifQ.eyJhdWQiOlsiaHR0cHM6Ly9rdWJlcm5ldGVzLmRlZmF1bHQuc3ZjLmNsdXN0ZXIubG9jYWwiXSwiZXhwIjo1MzAzMTM3NjA4LCJpYXQiOjE3MDMxNDEyMDgsImlzcyI6Imh0dHBzOi8va3ViZXJuZXRlcy5kZWZhdWx0LnN2Yy5jbHVzdGVyLmxvY2FsIiwia3ViZXJuZXRlcy5pbyI6eyJuYW1lc3BhY2UiOiJrdWJlcm5ldGVzLWRhc2hib2FyZCIsInNlcnZpY2VhY2NvdW50Ijp7Im5hbWUiOiJrdWJlcm5ldGVzLWRhc2hib2FyZCIsInVpZCI6IjM5YTllZmMxLTM2ZjItNGYyNC05MDA3LTNkOTU3YTQyODE0YyJ9fSwibmJmIjoxNzAzMTQxMjA4LCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6a3ViZXJuZXRlcy1kYXNoYm9hcmQ6a3ViZXJuZXRlcy1kYXNoYm9hcmQifQ.txLl3XSfVEHasMGc8-NITEBGPXst9ffMPtuVm921pc6sB5tvx3y7AztRiJUy-G7e5lszNvPDAWt8pQROroYL8YV49jw0gXOcbVh2yHvTsFzOvD7m4UygvInnjEjmrWSJSdhKAL2uyEJUQgHWd1D5oO_iKma2rCOhSw_O4nH8-Sp0mSGSL8jNCw-dns14JEtrNN_4G9Yw8K1AWf2NTCmnXcMJ1_vyyDzWNHRdKeL2mPu5RtLUGuOkbMGGezPSlZRGx0Q69O9DODsCS9d51DQahagF6K1xxJf4l_lsx38YkzamMRjKMQ2djkkV3QIgXttrzmcWhJKj-sqozltmRJcYZw

Then, we define an externalIPs to the kubernetes-dashboard service to access the interface.

# kubectl patch svc kubernetes-dashboard -n kubernetes-dashboard -p '{"spec":{"externalIPs":["your-master-ip"]}}'

kubectl patch svc kubernetes-dashboard -n kubernetes-dashboard -p '{"spec":{"externalIPs":["172.20.82.86"]}}'
externalIPs

And now it’s time to go to our Kubernetes interface. Go to the dashboard via your browser as https://master-ip/. My url: https://172.20.82.86

Paste the special token generated for you and continue.

Kubernetes Dashboard login page — https://172.20.82.86

Change the “Namespace” section from the top left to “All Namespaces” and make sure everything on the screen is green and not red :)

All Namespaces
Master machine

Kubernetes Metric API

We have successfully completed the Dashboard installation, but there is no Metric API in the system yet. We cannot currently view critical things such as the workload on the nodes and the system usage of the pods. In this step, we will be able to install the Kubernetes Metric API, examine our cluster in more detail and gain more information.

Kubernetes Metric API (Application Programming Interface) is an API used to collect and present performance data and metrics of resources (e.g. nodes, pods, services) in a Kubernetes cluster. These metrics are used to monitor, analyze and measure the health and performance of the Kubernetes cluster.

The Kubernetes Metric API takes data typically collected by a custom measurement system or metric provider and exposes it in a standard format. In this way, various monitoring tools and systems can obtain information about the status of the Kubernetes cluster and trigger reactions such as autoscaling or alerts if necessary.

Many metrics are available in a format compatible with standard metrics used by popular open source monitoring tools such as Prometheus. Kubernetes Metric API exposes such metrics to users through various API endpoints.

Thanks to this API, it is possible to understand the performance and health of the Kubernetes cluster, detect problems and manage resources more effectively.

First, create a file named kubernetes-metric-api.yaml and paste the long command below into it. sudo nano kubernetes-metric-api.yaml

apiVersion: v1
kind: ServiceAccount
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
k8s-app: metrics-server
rbac.authorization.k8s.io/aggregate-to-admin: "true"
rbac.authorization.k8s.io/aggregate-to-edit: "true"
rbac.authorization.k8s.io/aggregate-to-view: "true"
name: system:aggregated-metrics-reader
rules:
- apiGroups:
- metrics.k8s.io
resources:
- pods
- nodes
verbs:
- get
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
k8s-app: metrics-server
name: system:metrics-server
rules:
- apiGroups:
- ""
resources:
- nodes/metrics
verbs:
- get
- apiGroups:
- ""
resources:
- pods
- nodes
verbs:
- get
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
labels:
k8s-app: metrics-server
name: metrics-server-auth-reader
namespace: kube-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: extension-apiserver-authentication-reader
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
k8s-app: metrics-server
name: metrics-server:system:auth-delegator
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:auth-delegator
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
k8s-app: metrics-server
name: system:metrics-server
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:metrics-server
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
---
apiVersion: v1
kind: Service
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
spec:
ports:
- name: https
port: 443
protocol: TCP
targetPort: https
selector:
k8s-app: metrics-server
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
spec:
selector:
matchLabels:
k8s-app: metrics-server
strategy:
rollingUpdate:
maxUnavailable: 0
template:
metadata:
labels:
k8s-app: metrics-server
spec:
containers:
- args:
- --cert-dir=/tmp
- --secure-port=4443
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --kubelet-use-node-status-port
- --metric-resolution=15s
- --kubelet-insecure-tls
image: registry.k8s.io/metrics-server/metrics-server:v0.6.3
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /livez
port: https
scheme: HTTPS
periodSeconds: 10
name: metrics-server
ports:
- containerPort: 4443
name: https
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /readyz
port: https
scheme: HTTPS
initialDelaySeconds: 20
periodSeconds: 10
resources:
requests:
cpu: 100m
memory: 200Mi
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
volumeMounts:
- mountPath: /tmp
name: tmp-dir
nodeSelector:
kubernetes.io/os: linux
priorityClassName: system-cluster-critical
serviceAccountName: metrics-server
volumes:
- emptyDir: {}
name: tmp-dir
---
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
labels:
k8s-app: metrics-server
name: v1beta1.metrics.k8s.io
spec:
group: metrics.k8s.io
groupPriorityMinimum: 100
insecureSkipTLSVerify: true
service:
name: metrics-server
namespace: kube-system
version: v1beta1
versionPriority: 100
kubernetes-metric-api.yaml

After creating the file, run the following commands in order.

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/high-availability.yaml

kubectl apply -f kubernetes-metric-api.yaml
Run commands

After running the commands, wait for five minutes. Meanwhile, Metric API will collect and visualize the data. After waiting a bit, let’s go back to the Kubernetes Dashboard.

As seen in the picture below, we can now see the RAM and CPU usage of Pods and Namespaces.

System usage of applications

And as can be seen in the picture below, we can see all the CPU and RAM usage of our cluster.

System usage of the Cluster

We have now finished everything about the Kubernetes Dashboard. Now we can move on to the Storge configuration phase required to uninstall our first application on Kubernetes.

Kubernetes Storage configuration

The Storge domain is an important domain in Kubernetes. It maps the data of running applications in a physical way, thus preventing data loss even if the application crashes, and the accumulated data can also be transferred to another service.

Kubernetes Storage refers to the concept of managing the storage needs of containerized applications orchestrated by Kubernetes. Kubernetes is an open-source container orchestration platform used to deploy and manage containerized applications. Containers are lightweight, portable units that enhance application portability and enable rapid deployment.

Kubernetes Storage addresses the storage requirements of applications running within containers and provides solutions for storing and sharing data. This includes storing application states, configuration information, database files, or other critical data types within containers. Kubernetes provides different storage classes and resources to fulfill these storage needs.

Key features of Kubernetes Storage include:

  1. Volume: A storage unit attached to a container. It can support various storage classes and can be shared among one or more containers.
  2. Persistent Volume: This is a Kubernetes resource used to facilitate data sharing between pods. It can independently store data for a pod and be used by another pod.
  3. Storage Class: It defines the storage classes to be used in a Kubernetes cluster. Storage classes represent different storage types and levels (e.g., SSD or HDD).
  4. Dynamic Provisioning: This feature automatically creates a storage resource if a requested but not yet created storage resource exists. This makes storage management more flexible and automated.
Kubeadm Storge

Kubernetes Storage offers various options to store application data reliably, at scale, and in a manageable manner. This allows containerized applications to store data securely and scale according to their needs.

There is no default Storge in the Kubeadm installation, so import the yaml file below. Otherwise, the volumes of your applications cannot find a disk to connect to.

Create a file named StorageClass.yaml and write the following commands into it.

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: local-path
annotations:
kubectl.kubernetes.io/last-applied-configuration: >
{"apiVersion":"storage.k8s.io/v1","kind":"StorageClass","metadata":{"annotations":{},"name":"local-path"},"provisioner":"rancher.io/local-path","reclaimPolicy":"Delete","volumeBindingMode":"WaitForFirstConsumer"}
storageclass.kubernetes.io/is-default-class: 'true'
managedFields:
- manager: kubectl-client-side-apply
operation: Update
apiVersion: storage.k8s.io/v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.: {}
f:kubectl.kubernetes.io/last-applied-configuration: {}
f:provisioner: {}
f:reclaimPolicy: {}
f:volumeBindingMode: {}
- manager: kubectl-patch
operation: Update
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
f:storageclass.kubernetes.io/is-default-class: {}
provisioner: rancher.io/local-path
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
StorageClass.yaml

Let’s run these commands on our cluster with kubectl apply.

kubectl apply -f StorageClass.yaml
kubectl apply -f StorageClass.yaml

As can be seen in the picture below, Storege has now been created.

local-path

Let’s create PV and PVC on the Storge we created for trial purposes.

Create two files named example-pv.yaml and example-pvc.yaml in your current directory and write the following commands into them.

example-pv.yaml file

apiVersion: v1
kind: PersistentVolume
metadata:
name: example-pv
spec:
capacity:
storage: 1Gi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: local-path
hostPath:
path: "/home/ubuntu/example-pv"
example-pv.yaml file

example-pvc.yaml file

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: example-pvc
spec:
accessModes:
- ReadWriteOnce
storageClassName: local-path
volumeName: example-pv
resources:
requests:
storage: 1Gi
example-pvc.yaml

Run commands

kubectl apply -f example-pv.yaml
kubectl apply -f example-pvc.yaml
Run commands

After the commands run, let’s check PV and PVC from the interface, they should be connected to each other without any errors.

Kubernetes Dashboard PV
Kubernetes Dashboard PVC

After making the checks and making sure that it is connected without errors, it is time to move on to the final stage and create a Kubernetes application.

Helm and launching a sample application

https://v2.helm.sh/

In this last step, we will learn and install Helm and then launch an application via Helm Chart.

Helm is a package manager used to manage and deploy Kubernetes applications. Helm was developed to make complex application deployments and configurations in Kubernetes environments more manageable and reusable.

The key components of Helm include:

  1. Chart: It is the fundamental building block of Helm packages. A Chart consists of files that define, configure, and deploy a Kubernetes application. Typically, a Chart includes application definitions, dependencies, configuration files, and pre-configured values for specific application versions.
  2. Release: It represents a specific deployment of an application at a particular point in time. When you combine a Chart with a specific configuration and deploy it to a Kubernetes environment, that deployment is referred to as a “Release.”
  3. Repository: It is the place used to store and share Helm packages. Helm packages can be shared through official Helm repositories and user-created custom repositories.

Helm follows the following steps:

  • Chart Creation: Create a Helm Chart for an application, including application definitions, configuration information, and other necessary files.
  • Chart Deployment: Deploy the created Chart to the Kubernetes environment. This involves applying an application with a specific version or configuration to Kubernetes.
  • Release Update: Update the application’s version or configuration and apply these changes using Helm.

Helm stands out as a user-friendly tool for simplifying application management in Kubernetes. It is supported by a broad open-source community and has an extensive Chart repository.

Helm installation

Run the following commands on the master and complete the Helm installation.

curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
chmod 700 get_helm.sh
./get_helm.sh
Helm install and version

Some of Helm’s important commands. We will use some of these commands in this article and some in our other articles, just read them to become familiar with them.

Helm commands

  • Create a Chart:
# Create a new Helm chart
helm create mychart
  • Install a Chart:
# Install a Helm chart into the Kubernetes cluster
helm install myrelease ./mychart
  • List Installed Charts:
# List installed Helm releases
helm list
  • Uninstall a Chart:
# Uninstall a Helm release
helm uninstall myrelease
  • Update a Chart:
# Update an installed Helm release
helm upgrade myrelease ./mychart
  • Add Update Parameters:
# Add new values when updating a Helm release
helm upgrade myrelease ./mychart --set key1=value1,key2=value2
  • Rollback to a Previous Version:
# Rollback a Helm release to a previous version
helm rollback myrelease 1
  • List Chart Repositories:
# List available Helm repositories
helm repo list
  • Update Chart Repositories:
# Update Helm repositories
helm repo update

These commands cover basic Helm usage. Helm has more features and options, so refer to the Helm Documentation for detailed information.

We will use some of these commands in this article and some in other articles, you just need to read them all and become familiar with them.

After completing the Helm installation and learning the necessary and important information, let’s install the Apache HTTP Server application via Artifacthub.

Installing Apache HTTP Server with Helm

https://artifacthub.io/

ArtifactHub is a platform for discovering, sharing, and distributing open-source software packages and Kubernetes Helm charts. It serves as a central repository for application packages, making it easier for communities to create, share, and utilize these packages.

Key features of ArtifactHub include:

  1. Central Repository: ArtifactHub is a central repository containing a variety of open-source software packages and Helm charts. Users can search, discover, and use packages available in this repository.
  2. Search and Filtering: ArtifactHub enables users to quickly find desired packages using search and filtering options. It supports categorization, tagging, and keyword-based filtering.
  3. Community Contributions: Users can share their own software packages or Helm charts on ArtifactHub, fostering collaboration and knowledge sharing within communities.
  4. Version Control: ArtifactHub tracks different versions of packages, allowing users to select the specific version of a package they want to use.
  5. Integration and API: ArtifactHub provides APIs for integration with other tools or processes, facilitating automation and process improvements.

ArtifactHub is particularly valuable for users looking to discover and use packages, such as Helm charts, that are popular in the Kubernetes ecosystem.

Let’s search for Apache HTTP Server on Artifacthub and enter the Helm Chart published by Bitnami. Link: https://artifacthub.io/packages/helm/bitnami/apache

Apache HTTP Server Helm Chart via Artifacthub

Click on the Install button on the right as seen in the photo above.

Install Apache commands

Then copy those two codes and make changes as follows.

helm repo add bitnami https://charts.bitnami.com/bitnami

# add --create-namespace --namespace apache-http-server
helm install my-apache bitnami/apache --version 10.2.4 --create-namespace --namespace apache-http-server
Helm install Apache

In this way, we have completed the Apache HTTP Server installation. Now, let’s go to our Kubernetes interface and make the necessary reviews and corrections.

Apache HTTP Server via Kubernetes Dashboard

As you can see in the picture above, Apache HTTP Server is standing and running as a single pod on node1. Now, let’s scale it to two nodes via Deployment.

Scale Apache HTTP Server

As can be seen from the photo below, it now works as two pods and works on node1 and node2, sharing the load equally.

Scaled version

If you wish, you can enter the pods, edit the yaml files or read the logs.

Logs, Exec, Edit and Delete
Pod logs
Pod exec
Pod edit

Now we need to define an external IP to Apache Web Server and make it accessible from outside. We will assign external IP with the following command.

But before assigning an external IP, we need to go to the Services section and delete the HTTPS (443) port in the my-apache section, otherwise Apache HTTP Web Server will also run on port 443 and will not allow us to access our Kubernetes Dashboard.

Port deletion process

After deleting the port, we can now proceed with the external IP assignment process. To do this, simply run the command below.

# kubectl patch svc my-apache -n apache-http-server -p '{"spec":{"externalIPs":["your-master-ip"]}}'

kubectl patch svc my-apache -n apache-http-server -p '{"spec":{"externalIPs":["172.20.82.86"]}}'
kubectl patch command
The service is active

Go to http://your-ip:80 in the External Endpoints section. For my example, this address corresponds to http://172.20.82.86:80.

http://your-ip:80http://172.20.82.86:80

As you can see in the photo above, Apache Web Server is up and accessible, all incoming requests will be shared equally on node1 and node2.

Final

Congratulations, you finished the article.

Now you have a complete Kubernetes Cluster in every aspect. In the next articles, we will write new articles in the field of Data Engineering using this Cluster.

Take care of yourself and don’t forget to follow me :)

Ahmet Furkan Demir

--

--