From mono (lithic) SQL to Python based micro services: Part 3

Prometheus and (his side kicks) Thanos \ Push Gateway

Pini Faran
Machines talk, we tech.
7 min readSep 18, 2022

--

Monitoring basics

In the previous posts I’ve described the foundations of our new data pipeline architecture, as well as shed some light on some subtle pitfalls we’ve encountered and how we resolved them.

In this post I would like to explain how we added monitoring capabilities to our new architecture.

I assume there is no need to explain as to why monitoring is required, so i’ll skip straight away to the what and how.

For the “what” part → we followed the RED concept (Requests, Errors, Duration), and added on top of that additional metrics per micro service. More on that explained below.

As for the “how” part → no surprises here, we took the obvious choice, which Faust proved to have an OOB integration with: Prometheus. But we did “enriched” Prometheus with an additional 3rd party :-)
More on that in this post…

What to monitor?

While each micro service has its own logic (aggregation, calculating formulas, ingestion, etc…), they also have some common ground regarding how the record processing framework works.

So the following can be reported for every micro service, regardless of what it is performing:

Common metrics following the RED concept — part 1
Common metrics following the RED concept — part 1
Common metrics following the RED concept — part 2
Common metrics following the RED concept — part 2

(Note we use the term “Tag ID” in exchange for “Sensor ID”. Labels refer to Prometheus labels, more on that in the next section)

This is “classic” RED: for each micro service we will be able to tell how many telemetry records it is processing, how long does it take them to be processed and how many errors were encountered during processing.

In addition to that, for each service we report the number of input records it got and the number of output records it produced.

However, while this information is crucial to understand what is happening in our data pipeline, it is not enough.

We would like to measure (and report) additional metrics that are micro service specific.

For example: we would like to know how many values were aggregated in the aggregation service (per project, per sensor).

We would also want to know how many formulas calculations resulted with no output (because of missing data) in the formula calculator service.

And so on…

So on top of the RED (common) metrics, we defined & implemented the reporting of additional metrics, tailored per micro service.

How to monitor?

Using prometheus makes monitoring pretty straight forward from a micro service point of view:

The micro service exposes an end point with the metrics being exposed for collection. Prometheus does metrics scraping (collecting), and from that point, you can use any visualization tool that supports Prometheus as data source to view dashboards of your choice. For example, you can use Grafana for visualizing those metrics and issuing alerts via Grafana.

(Note that Prometheus also comes with its own Alert Manager framework that can be used for this purpose)

From high level architecture, integration with Prometheus looks as follows:

Prometheus integration architecture

The big advantage of this method of work, is that metrics are pulled by Prometheus in a way to which the micro service is agnostic (other than exposing the end point).

There are several monitoring tools that support pulling metrics from “Prometheus-compatible” end points, so you could replace the monitoring tool you use without having your micro services know anything about it!

Quite neat ;-)

Supporting dimensionality (and avoiding bad dimension pitfalls!)

In many cases you would like to report the same metric but with different dimensions.

For example: our aggregation micro service would like to report the total number of aggregations it performs, but also have the statistics grouped per customer ID and sensor ID.

We could, of course, create a counter for each such combination, i.e.:

Total_Aggregations_Customer1_Sensor1
Total_Aggregations_Customer1_Sensor2
Total_Aggregations_Customer2_Sensor1
Total_Aggregations_Customer2_Sensor3
Total_Aggregations_Customer2_Sensor4
Etc…

The problems with that approach is that:

  • We would reach a huge amount of counters very quickly
  • We don’t have “slice and dice” operations very easily (i.e. count total of customer1 sensors, etc…)

A better, more elegant, solution is to use Prometheus labels.

Labels are used to discuss different\additional characteristics of what we measure. So in the example above we will have:

metric name: “total_aggregations”
labels:
customer_name=”customer1|customer2|…”
sensor_id=”sensor1|sensor2…”

This way we keep our metric names concise and to the point, and be able to perform “slice and dice” operations according to customer name and\or sensor ID.

Good label, bad label

A short time after we started to monitor our production using Prometheus, we saw that Prometheus storage was filling up very quickly… and we had only one customer onboarded on our new pipeline!

It took us several debug iterations, including Python memory profiler to finally find out that the root cause for this was reporting an error counter with a label that contained the specific exception information.

BAD IDEA.

Why is that?

Every new combination of key-value pair of a label is, in fact, a new time series, which can increase the amount of data being stored. So using labels with many different label values (e.g. email address, user ID, etc…) is completely prohibited!

Use only labels with small bounded set of possible values (e.g. color, gender, etc…)

Integration with Faust

As mentioned in the previous posts, we use Faust as our micro service framework on top of Kafka. and lucky for us, this framework comes with OOB integration with Prometheus.

The main classes Faust provides for exposing metrics to Prometheus are called FaustMetrics and PrometheusMonitor.

FaustMetrics → serves as a container of the metric definitions
PrometheusMonitor → the class which contains the logic for incrementing the metrics and exposing them as end points (It starts an internal HTTP server for exposing an endpoint)

Because we wanted to extend the metrics we expose beyond what Faust provides (since we wanted to expose additional business logic related metrics), we extended both of them.

For each metric we add, we need to define its type, name and labels. See this example for our padding micro service:

Once we have that set up, we can extend Faust’s Prometheus monitor class to support methods that increment those metrics:

And now we can invoke those from our micro service main code in a simple manner:

Thanos (OR: have a Prometheus with long term storage)

As we deployed to production, and added more and more metrics we would like to collect and monitor, our Prometheus storage started filling up quite quickly.

No matter what our Dev Ops tried, increasing Prometheus built in storage always ended up with… asking him for additional storage!

The remedy was using Thanos. With Thanos, we leverage Azure’s blob storage to be our backing storage, and as such, have practically limitless storage for our metrics.

Thanos uses the metrics and web-metrics containers for storing metrics data:

Thanos storage account

Once our dev ops had that in place, our storage issues have gone away :-)

Push Gateway (OR: support short lived jobs report metrics to Prometheus)

So monitoring Faust based micro services is relatively easy. But what about batch jobs?

Hmmm… Well here the plot thickens a bit.

By “batch jobs” we mean any piece of code which executes once or periodically (scheduled), in which some docker image is spawned in some pod, executes some (Python) code, and then the pod terminates.

In such a case Prometheus may not be able to perform monitoring, since it cannot scrape anything after the pod terminates (naturally)…

For such a case, Prometheus provides another component called Push Gateway.

Push Gateway is an intermediate service which is always up, so it can accept from such jobs any metrics to be exposed, and exposes them to Prometheus using the standard end point mechanism for scraping:

Monitoring batch job with prometheus and push gateway
Monitoring batch job with prometheus and push gateway

What’s next?

In this “trilogy” of posts i’ve put down our architecture foundations for the new data pipeline, some subtle issues we’ve tackled along the way, and how we perform monitoring.

In my next posts I will demonstrate some ad-hoc techniques and tools I’ve found very useful along the way for debugging issues, developing, some tailored capabilities we have in our pipeline and much more.

So if you are interested in things like: KSQLDB, refreshing Prometheus configuration, data replay scenarios and other cool stuff → stay tuned!

As before, the people contributing to the actual work this post describes are our great data engineers, along with our marvelous dev ops guy:

Stav Ben Salmon
Alexander Krasnostavsky
Inna Trava
Shira Yad Shalom
Tomer Saporta
Yoav Marom
Roni Afudi
Barak Solomon
Lior Hazani
Ghaleb Salah

--

--

Pini Faran
Machines talk, we tech.

Data engineer team lead. Has been around for 20+ years in software development. Seen quite a lot in C\C++\Java\Python\ML and more. Loves to share stuff!