Adding observability to a Kubernetes cluster using Prometheus

Monitoring your services is vital and should be considered as part of your underlying infrastructure for your services. You should put this in place ahead of creating and deploying your services. In this article I look at how to deploy Prometheus provide the observability you need to run your services.

14 min readJan 7, 2024

Adding observability to your Kubernetes cluster

This article follows my article on creating a Kubernetes cluster using Infrastructure as Code (IaC) tools such as Terraform and Ansible and assumes you have a cluster ready to use. It also requires persistent storage, which you can read about here.

What to do when things go wrong

Anyone who has developed software knows that things go wrong. Whether this is during development, testing or in production, unexpected things happen to break your services. These problems can be broadly categorised into:

Resource depletion (eg: running out of memory)
Logic errors (eg: defects in the code)
Unexpected user behaviour (including cyber attacks)
Operational activities (eg: defects created in the data)

When things do go wrong, the results for your users and for you could be catastrophic, frustrating or plain old embarrassing. It might be that your service fails completely, partially, slowly, or worse, without anyone noticing.

The last one is particularly problematic. For catastrophic failures, their highly visible impact is felt quickly by users and the user provides the feedback. In the case where a service is starting to go wrong and may even be making mistakes that no one notices, there is no user feedback. Your service may deteriorate further and the first you know about it is when it becomes a catastrophic failure. That is bad for your users, bad for your business and bad for you.

Hopefully you now understand why you need to ensure that you know about problems first and that you can fix things before they become catastrophes.

Whilst I will introduce the concepts around monitoring and alerting, there are many great articles and descriptions that describe these in detail.

Monitoring and Alerting

More often than not, enabling the operation and management of your services is not considered until it is too late. The focus tends to be on the functionality provided, not how it is to be operated.

I would recommend you always consider how you will operate and manage your services from the start of the project. Do not leave it until after your first major production outage.

With this in mind, I want to introduce monitoring and alerting.

Monitoring

Let’s say you have your Kubernetes cluster running a set of services that you have deployed. It needs to run 24 x 7. As mentioned earlier, there are many reasons why it may not, such as running out of resources, unexpected events, defects and cyber events.

To ensure your services are always available for your users to use, you want to be able to see how your services are performing at any given time.

You should consider adding a monitoring system to your deployment from the begining. It will capture vital performance information from your services in a central location and will provide you the ability to review and analyse that information, so you can spot when something is going to go wrong or work out what happened after something went wrong.

Another reason you should start your monitoring system early is that it can take time to optimise the information you are going to make available through the system. By starting early, you stand more chance of having a functional, operational and beneficial monitoring system for when you launch.

When it comes to the type information you need your monitoring system to collect, it is important to understand the difference between logs and metrics:

A log is a time ordered list of events that your service recorded and generally contains information that helps you work out what was happening at that point in time.
A metric is a measurement of resource consumption and/or the amount of events that have occurred over a small time period.

When monitoring a system, it is necessary to capture both logs and metrics. Metrics may be calculated from the logs and reported alongside the log entries themselves to allow you to get a full picture. Other metrics may be collected from the service itself.

Alerting

Ok, so you are monitoring your services (and the underlying infrastructure that supports them) but you cannot stay glued to your screen 24x7. Instead, you need to be told when something has gone wrong (or better still, that something will go wrong without your action).

This is where alerting comes in.

When things do go wrong (or start to go wrong), you want to know quickly, regardless of where you might be. For this reason, you want your alert to be delivered through a channel that is likely to catch your attention no matter where you are. Channels such as email, Slack, text message, push notification are examples that are likely to be effective.

The alert is triggered from the information that is collected by your monitoring system based on a set of rules that you set up. These rules may be based on metrics, specific types of log entry or a combination of both.

You should now be building a picture of how your service, your monitoring and alerting systems work together.

It is important that your monitoring and alerting system is reliable and fault tolerant. In a failure case you do not want to find that your service monitoring failed to capture the required information or that it lost it. You do not want your alerts to fail to reach you.

We will now look at how Prometheus can help within a Kubernetes cluster by providing monitoring. Prometheus also provides alerting but I want to come back to that in another article.

Architecture

From my previous articles, I am assuming that you have a Kubernetes cluster that looks like this:

We will now install Prometheus onto this cluster, using a Persistent Volume backed by the nfs-server.

Starting with a Kubernetes cluster backed by an NFS server and persistent storage. To this we will add:

A Persistent Volume (PV) for Prometheus
A Persistent Volume Claim (PVC) for Prometheus
Prometheus

All these components will be added to a Kubernetes namespace called monitoring.

We will configure Prometheus to scrape metrics from the cluster itself.

You will see that Prometheus is capable of providing sophisticated alerting via its Alert Manager module but I have decided to use Grafana as it is more user friendly and lends itself to more to ad hoc changes, allowing you to experiments with alerting rules and levels as you learn about how your system behaves.

Setting up our PVC

You would not be very happy if, when your monitoring pods are restarted, you lost all your historic data and your configurations, requiring you to start again. I know this as I have been there.

We need a place for Prometheus to store its data safely. We do this using a Persistent Volume (PV), which the application claims through a Persistent Volume Claim (PVC). I have written about creating PVs and PVCs here.

Creating the PVs

I am assuming you have a Kubernetes cluster with access to an NFS server.

I would strongly suggest that you create a separate share for Prometheus. If you have followed my previous articles, you will need to set up this share. Log in to your nfs-server and modify this file as root (keep any other changes you may have made):

/etc/exports

/pv-share *(rw,async,no_subtree_check)
/pv-share/prometheus *(rw,async,no_subtree_check)

Before we load these into NFS, we have to create the subfolder:

sudo mkdir /pv-share/prometheus
sudo chmod 777 /pv-share/prometheus

Note that these file permissions are weak and should not be used for production. For this article I am showing an example to get you started.

Now load this share and ensure the service starts correctly.

sudo systemctl restart nfs-server
sudo systemctl status nfs-server

You can now use this share.

Log in to your k8s-master and create the following file (I am assuming here that you are accessing your cluster via kubectl on your master node. If not, use whatever access you typically use to deploy to your cluster):

Remember to replace any fields between < and > with your own values.

prometheus-pv.yml

apiVersion: v1
kind: PersistentVolume
metadata:
  name: prometheus-pv
spec:
  capacity:
    storage: 10Gi
  storageClassName: prometheus-class
  accessModes:
  - ReadWriteOnce
  nfs:
    path: /pv-share/prometheus
    server: <nfs-server IP address>
  persistentVolumeReclaimPolicy: Retain

Remember to change the path and the server IP addresses to those used by your cluster. You may also need to change the overall size of these PVs, which I have set to 10GB.

Now create the PV and check it has been created:

kubectl create -f prometheus-pv.yml
kubectl get pv

You should see your PV is now available to the cluster:

NAME            CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM   STORAGECLASS       REASON   AGE
prometheus-pv   10Gi       RWO            Retain           Available           prometheus-class            30s

We now need to create a PVC for the PV. Before we do that, we need to create the namespace for them:

kubectl create namespace monitoring

Now create this file:

prometheus-pvc.yml

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: prometheus-pvc
  namespace: monitoring
spec:
  storageClassName: prometheus-class
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi

Now create them and check they are bound to the PV:

kubectl create -f prometheus-pvc.yml
kubectl get pvc -n monitoring

This should immediately show the PVCs bound to their PV equivalent:

NAMESPACE    NAME             STATUS   VOLUME          CAPACITY   ACCESS MODES   STORAGECLASS       AGE
monitoring   prometheus-pvc   Bound    prometheus-pv   10Gi       RWO            prometheus-class   21s

These can now be mounted in your Prometheus pods.

Installing Helm

It is possible to create manifest files (deployment, service, secrets and configuration yaml files) for Prometheus and Grafana and then deploy them. This is a complex process to get right and it is better to use Helm charts instead, which come with these two applications integrated.

Helm is like a package manager for Kubernetes and manages the dependencies and configurations required for the application(s) you are loading. It installs all the manifests you need to make your system operational.

Helm is installed where you run kubectl, which in my case is on the k8s-master node.

The easiest way to deploy Helm is from the install script.

curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
chmod +x get_helm.sh
./get_heml.sh

You can check Helm is installed with:

helm version
helm

This will show you the version of Helm you have installed as well a list of commands you can use with Helm.

Deploying Prometheus and Grafana

Now we have Helm installed we can use it to deploy Prometheus and Grafana in to our cluster with all the configuration in place to monitor our cluster.

There are many ways to deploy Prometheus. In fact, you can see a list of available charts in the Artifacthub alone with:

helm search hub prometheus

The list is very long and can be quite daunting. Some are no longer supported.

The instructions here are working at the time of writing, Jan 2024.

Deploying Prometheus

First we will add the community Helm chart repository to our system:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

We can now install Prometheus but if we do that, it will create an ephemeral storage within the container. You can get the installation to create a PV and PVC automatically using PV operators but want it to use the PV and PVC we created earlier.

To do this we create a values file. A values file is a yaml file that overrides or defines additional configuration for a Helm chart. In this case we will use a values file to override the defaults in the Helm chart.

The structure and purpose of the values properties is quite extensive. Looking at the Helm chart version helps. You can find it here.

Whilst we are telling Prometheus to use our PV/PVC, we can also tell it not to start its Alerts Manager as we will be using Grafana for this. We will also remove the Push Gateway as we will not be using this.

Create the following file (remember to replace the < > fields with your own values).

prometheus-values.yml

alertmanager:
  enabled: false
prometheus-pushgateway:
  enabled: false
server:
  service:
    externalIPs:
    - <k8s-master IP address>
    servicePort: 9090
    type: NodePort
    nodePort: 31190
  persistentVolume:
    enabled: true
    existingClaim: prometheus-pvc

Prometheus will normally create a ClusterIP service that requires you to carry out a bunch of port forwards. In this values file I have asked it to create a NodePort that is then accessible externally.

Now let’s install Prometheus, from the community supported Helm chart, with our values override file. We will install it into the monitoring namespace we created earlier.

helm install prometheus-monitoring prometheus-community/prometheus -f prometheus-values.yml --namespace monitoring

Now check that it is up and running with:

kubectl get pods -n monitoring

You should see:

NAME                                                           READY   STATUS    RESTARTS   AGE
prometheus-monitoring-kube-state-metrics-84945c4bd5-n29mr      1/1     Running   0          12m
prometheus-monitoring-prometheus-node-exporter-ksmnl           1/1     Running   0          12m
prometheus-monitoring-prometheus-node-exporter-swrhj           1/1     Running   0          12m
prometheus-monitoring-prometheus-node-exporter-zp5mz           1/1     Running   0          12m
prometheus-monitoring-server-94f974648-x7jxs                   2/2     Running   0          12m

Some descriptions:

kube-state-metrics allows Prometheus to scrape the cluster metrics via the Kubernetes API
node-exporter allows Prometheus to scrape the cluster nodes themselves (we have 3 Kubernetes nodes so we need 3 daemon pods)
server is Prometheus itself

Note that you can uninstall Prometheus at any time with:

helm delete prometheus-monitoring -n monitoring

You should not lose any data as the PVC is still retained:

kubectl get pvc -n monitoring

Testing Prometheus

You should now be able to go to a browser on your development machine and access the UI at: http://<k8s-master IP address>:9090/graph.

All being well, you will be presented with the Prometheus graph page. From here you can type a search criteria into the search bar. For example kubelet_active_pods. When you click Execute you will see the number of pods created since you started Prometheus.

Adding monitoring to our gateway and NFS servers

With Prometheus set up, we still have to ensure our servers that are not in the cluster are also monitored. If you have been following my articles on automating the creation of a Kubernetes cluster, you will have a cluster that includes a gateway server that acts as a ingress point to the cluster from the Internet. You will also have an NFS server that is providing our PVs.

These are vital components in our architecture and, even though they are not in our cluster, we need to monitor them.

Welcome to the Prometheus Node Exporter. This is a service that will run on our external nodes and collect metrics from the node’s Operating Systems (OS) and present them in a way that Prometheus can scrape them.

We will need to install Node Exporter on both of our external servers. I’ll only explain one of them for brevity.

First go to the official set of downloads to find the correct version. At the time of writing, I am selecting 1.7.0 linux on amd64.

wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz
tar xvf node_exporter-1.7.0.linux-amd64.tar.gz

This will download 3 files into a folder within your current folder. Copy the executable as follows:

sudo cp node_exporter-1.7.0.linux-amd64/node_exporter /usr/local/bin

You can now delete the downloaded folder and file.

It is recommended that you run node exporter under a separate user. The user should not be able to log in. Create the user and assign the binary to them.

sudo useradd --no-create-home --shell /bin/false node_exporter
sudo chown node_exporter:node_exporter /usr/local/bin/node_exporter

We now need to create the service to run the exporter. As root create the following file (remember to change < > fields to match your set up):

/etc/systemd/system/node_exporter.service

[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter $ARGS --web.listen-address <server private IP address>:9100
Restart=always
RestartSec=3[Install]
WantedBy=multi-user.target

Reload the service daemon so it picks up the new service and then start and enable the service.

sudo systemctl daemon-reload
sudo systemctl start node_exporter
sudo systemctl enable node_exporter

You should be able to check the service is up and working as expected by logging in to your master node and using:

curl <server private IP address>/metrics -v

You should see a set of metrics being returned.

For the gateway server, you should not be able to access this end point from the Internet but you may also find you cannot access it from the k8s-master node either. This is because the gateway has a firewall preventing the connection. On the gateway server, enable the port with:

ufw allow 9100/tcp

At this point your two servers should now have node_exporter running on them. Now we need to get Prometheus to scrape these two new data sources.

Updating Prometheus

We now need to add our two new node_exporter instances to our Prometheus deployment.

It is tempting to add them in as two new endpoints to the Prometheus value file but, because the endpoints used by Prometheus are determined from calls to the Kubernetes API, we will, instead add them as extra scape configs.

Update your prometheus-values.yml to the following, replacing < > fields with your values.

alertmanager:
  enabled: false
prometheus-pushgateway:
  enabled: false
server:
  service:
    externalIPs:
    - <k8s-master IP address>
    servicePort: 9090
    type: NodePort
    nodePort: 31190
  persistentVolume:
    enabled: true
    existingClaim: prometheus-pvc
extraScrapeConfigs: |
   - job_name: 'rs-nfs-server'
     metrics_path: /metrics
     static_configs:
       - targets:
         - <nfs server IP address>:9100
   - job_name: 'rs-gw'
     metrics_path: /metrics
     static_configs:
       - targets:
         - <gw server IP address>:9100

Note that the extra scrape config is added as a string representing additional yaml so make sure you copy and paste the config above.

Once you update this configuration, you will need to delete and reinstall your helm charts.

helm uninstall prometheus-monitoring -n monitoring
helm install prometheus-monitoring prometheus-community/prometheus -f prometheus-values.yml --namespace monitorin

Check the pod statuses to wait for it come back up:

kubectl get pods -n monitoring

Now when you look at the Prometheus UI, you should see your additional servers in your node metrics.

Visualisation and Alerting

Prometheus gives you access to a set of metrics in tabular and graph form. However, this is not adequate for the majority of uses.

Typically a separate application is used to provide better visualisation. I will be following this article with another on installing Grafana to provide visualisation for the metrics.

In addition, Prometheus has sophisticated alerting but I have disabled it in this deployment as I will be implementing alerting through Grafana.

Summary

In this article we looked at the need to implement monitoring and alerting to ensure our services remain available and meeting the expectation of our users.

We then create a Persistent Volumes to hold our data and followed this up by installing Helm so we could then install Prometheus using a community Helm chart.

By overriding the Helm chart defaults, we were able to connect Prometheus to our PV as well as providing a NodePort service that allows us to access the User Interface. We also added Node Exporter to our non-Kubernetes servers so that Prometheus could scrape those metrics too.

In my next article, I will show you hw to install Grafana and connect it to Prometheus.

If you found this article of interest, please give me a clap as that helps me identify what people find useful and what future articles I should write. If you have any suggestions, please add them in the comments section.