Insight into Prometheus

Mihai Pruna
3 min readMay 6, 2024

--

Have you ever wondered why so many tech products got their names from literature, real life, or mythology? For example, Ubuntu and Debian are tribes in Africa. Google, the popular search engine got its name after Nikolai Gogol. Python, one of the most popular programming language got its name from Monty Python and the Holy Grail, the English sitcom.

Today I’m going to write about Prometheus an observability tool. In Greek mythology, Prometheus was known for his foresight and for bringing important knowledge and technology (like fire) to humanity. In tech space Prometheus offers insight into the state and performance of the infrastructure.

Prometheus is a widely-adopted open-source monitoring and alerting system that plays a crucial role in the observability and reliability of distributed systems. Prometheus measures the performance, and overall health of their distributed systems. Observability is the ability to measure the state of the systems.

The three pillars of observability are:

  • logging — a timestamped event (something happened at a particular time)
  • metrics — a quantifiable measure(a number) that is used to track and evaluate the performance, reliability, and overall health of a service or system
  • traces — is a detailed record of the path a request takes as it flows through a distributed system

Log example:

[Sun Dec 04 04:51:55 2005] [error] mod_jk child workerEnv in error state 6
[Sun Dec 04 04:52:04 2005] [notice] jk2_init() Found child 6738 in scoreboard slot 6
[Sun Dec 04 04:52:04 2005] [notice] jk2_init() Found child 6741 in scoreboard slot 9
[Sun Dec 04 04:52:05 2005] [notice] jk2_init() Found child 6740 in scoreboard slot 7

Github Log Repository

Below is an example of a metric from a Linux operating system, df shows the space available on that file system.

Filesystem      Size  Used Avail Use% Mounted on
udev 435M 0 435M 0% /dev
tmpfs 98M 1.3M 97M 2% /run
/dev/vda2 19G 6.0G 12G 35% /
tmpfs 488M 0 488M 0% /dev/shm
tmpfs 5.0M 4.0K 5.0M 1% /run/lock
/dev/vda1 512M 5.9M 506M 2% /boot/efi
tmpfs 98M 48K 98M 1% /run/user/1000

A trace uniquely identifies a request as it passes through a system. The trace has a unique trace id, that identifies a particular request. A trace is made up of multiple child events called spans, and each span has star time, duration, and parent id.

Prometheus is a metrics based observability tool.

“Prometheus collects and stores its metrics as time series data, i.e. metrics information is stored with the timestamp at which it was recorded, alongside optional key-value pairs called labels.”

Official Documentation

Some typical Prometheus metrics are:

  • CPU/Memory Usage
  • Disk Space
  • Service Uptime
  • Application specific information:
    . Error rate
    . Latency
    . Response time

Prometheus Architecture

Prometheus is a pull based system, other popular pull based systems are Nagios and Zabbix. There are pros and cons in using a pull based system, however I will not go into details right now. Prometheus will scrape data from a particular endpoint at a predefined interval.

Prometheus main components are:

  • Prometheus server which scrapes and stores time series data
  • TSDB — time series database
  • exporters — various client applications/binaries that push metrics at a specific endpoint which Prometheus can scrape
  • push gateway — client binaries for supporting short-lived jobs
  • alertmanager — define rules for alert handling
  • PromQL — a proprietary query language for Prometheus metrics
  • Grafana —(not a native Prometheus component) it’s one of the most popular dashboard software that integrates nicely with Prometheus.

In a nutshell Prometheus is:

  • an observability tool that helps teams evaluate infrastructure performance and status
  • a pull based system
  • it is used to collect metrics data
  • it facilitates alarm management based on predefined rules

References:

--

--

Mihai Pruna

I am a DevOps engineer with Telecom background. I am DCA, K8s KCA, Terraform Associate, Python PCAP certified. I'm always focused on development.