Centralise Prometheus metrics using Cortex with an example— Part 1

Kedarnath Grandhe
4 min readJun 22, 2024

--

Why this blog?

In the last blog, we have seen how to send metrics from one Prometheus cluster to another Prometheus cluster (assumed to be a centralised cluster). But, there are lot of issues that come up with this scenario and we will be looking at the solution for most of them in this blog.

This blog just covers the theoretical introduction to Cortex. The next blog i.e.., Part 2 will cover the hands on example.

What is Cortex? Why do we need it?

Cortex is a CNCF incubated opensource project that is horizontally scalable, highly available, multi-tenant, long term storage solution for Prometheus. Let’s look at each of these briefly:

  • Multi-Tenancy: In a corporate scenario, we have many teams in a single company working on different things and most of the times one team’s metrics should not be visible or accessible to other teams but, this is not possible with just Prometheus. Cortex helps us with having multiple tenants which are isolated even when they are stored in a database and we can even set different kinds limits to each tenant as well.
src: CNCF
src: CNCF
  • Horizontal Scalability: The architecture and services of Cortex are explained below but on a high level just know that Cortex is made of multiple services that can be deployed as micro-services and hence each service can be scaled up or down based on the requirement. For example, in the below image, Ingester service which is responsible for writing metrics to long-term storage is scaled to 3 replicas:
src: CNCF
  • High Availability: Cortex services can replicate data between their replicas which prevents data loss even with instance failures or pod evictions.
  • Long term storage: Cortex has a something called Blocks storage which is based on Prometheus’s Time Series Database. It stores data in blocks of 2 hour range and each block is composed by a few files storing the chunks and the block index. It can be configured to store in local file system or cloud solutions like AWS S3, Google Cloud Storage or Azure Storage etc. More on Blocks Storage here.

Cortex Architecture:

The architecture of Cortex can be represented as follows and please keep in mind that the diagram below doesn’t include all the Cortex services, it just represents a typical deployment topology:

Cortex Services:

Cortex has a service-based architecture, in which the overall system is split up into a variety of components that perform a specific task. These components run separately and in parallel. Cortex can alternatively run in a single process mode, where all components are executed within a single process. The single process mode is particularly handy for local testing and development.

The Cortex services are:

What’s next?

As always, we need to get hands on with an example but, that will make the blog so long so I’ve decided to write it as part 2 (Genius me! :D).

In the next blog, we will look at the implementation of Cortex as a single process i.e.., all the above services will be launched as one service and this is especially suitable in our case as we are just testing things. Apart from the implementation of the demo example, we will look at multi-tenancy in action.

That’s it! Hope you got an understanding of what Cortex is and why we need to use it with Prometheus and what problems it solves.

--

--