Insight into Prometheus

3 min readMay 6, 2024

Have you ever wondered why so many tech products got their names from literature, real life, or mythology? For example, Ubuntu and Debian are tribes in Africa. Google, the popular search engine got its name after Nikolai Gogol. Python, one of the most popular programming language got its name from Monty Python and the Holy Grail, the English sitcom.

Today I’m going to write about Prometheus an observability tool. In Greek mythology, Prometheus was known for his foresight and for bringing important knowledge and technology (like fire) to humanity. In tech space Prometheus offers insight into the state and performance of the infrastructure.

Prometheus is a widely-adopted open-source monitoring and alerting system that plays a crucial role in the observability and reliability of distributed systems. Prometheus measures the performance, and overall health of their distributed systems. Observability is the ability to measure the state of the systems.

The three pillars of observability are:

logging — a timestamped event (something happened at a particular time)
metrics — a quantifiable measure(a number) that is used to track and evaluate the performance, reliability, and overall health of a service or system
traces — is a detailed record of the path a request takes as it flows through a distributed system

Log example:

[Sun Dec 04 04:51:55 2005] [error] mod_jk child workerEnv in error state 6
[Sun Dec 04 04:52:04 2005] [notice] jk2_init() Found child 6738 in scoreboard slot 6
[Sun Dec 04 04:52:04 2005] [notice] jk2_init() Found child 6741 in scoreboard slot 9
[Sun Dec 04 04:52:05 2005] [notice] jk2_init() Found child 6740 in scoreboard slot 7

Github Log Repository

Below is an example of a metric from a Linux operating system, df shows the space available on that file system.

Filesystem      Size  Used Avail Use% Mounted on
udev            435M     0  435M   0% /dev
tmpfs            98M  1.3M   97M   2% /run
/dev/vda2        19G  6.0G   12G  35% /
tmpfs           488M     0  488M   0% /dev/shm
tmpfs           5.0M  4.0K  5.0M   1% /run/lock
/dev/vda1       512M  5.9M  506M   2% /boot/efi
tmpfs            98M   48K   98M   1% /run/user/1000

A trace uniquely identifies a request as it passes through a system. The trace has a unique trace id, that identifies a particular request. A trace is made up of multiple child events called spans, and each span has star time, duration, and parent id.

Prometheus is a metrics based observability tool.

“Prometheus collects and stores its metrics as time series data, i.e. metrics information is stored with the timestamp at which it was recorded, alongside optional key-value pairs called labels.”

Official Documentation

Some typical Prometheus metrics are:

CPU/Memory Usage
Disk Space
Service Uptime
Application specific information:
. Error rate
. Latency
. Response time

Prometheus Architecture

Prometheus is a pull based system, other popular pull based systems are Nagios and Zabbix. There are pros and cons in using a pull based system, however I will not go into details right now. Prometheus will scrape data from a particular endpoint at a predefined interval.

Prometheus main components are:

Prometheus server which scrapes and stores time series data
TSDB — time series database
exporters — various client applications/binaries that push metrics at a specific endpoint which Prometheus can scrape
push gateway — client binaries for supporting short-lived jobs
alertmanager — define rules for alert handling
PromQL — a proprietary query language for Prometheus metrics
Grafana —(not a native Prometheus component) it’s one of the most popular dashboard software that integrates nicely with Prometheus.

In a nutshell Prometheus is:

an observability tool that helps teams evaluate infrastructure performance and status
a pull based system
it is used to collect metrics data
it facilitates alarm management based on predefined rules

References:

Insight into Prometheus

Prometheus Architecture

In a nutshell Prometheus is:

Written by Mihai Pruna