Spark clusters monitoring with Prometheus and Graphite Exporter

Matheus Felipe dos Anjos

Published in

Blog Técnico QuintoAndar

6 min readJul 5, 2023

How QuintoAndar is designing the architecture for monitoring Spark clusters.

Motivation

At QuintoAndar we have hundreds of Spark Jobs per day running in the Databricks platform with jobs that are triggered and orchestrated by Airflow.

Having a high amount of Spark Jobs running in hundreds of clusters makes it difficult to monitor their resources every day, especially in the Databricks platform, which has a decentralized and limited visualization tool (Ganglia) for each cluster.

Because of that, we have been working on an architecture that aims to solve the decentralization problem by centralizing all the metrics from the clusters in a common service.

With that, it becomes easier to monitor and take preventive actions on a certain cluster before the jobs fail due to a lack of resources, mainly because of the possibility to create alerts based on metrics threshold. This option helps us to mitigate the number of times our Analytics Engineers are called in during the night to solve problems that had arisen on Spark jobs. And with this solution, we can prevent cluster breakage before it happens, by resizing the cluster resources or optimizing queries and processes in advance.

The Complete Architecture

As Apache Spark does not have a default connection to send metrics in a text-based time-series format — which is the one used in Prometheus — the Graphite Exporter has the responsibility of receiving metrics from each Spark cluster, in a graphite plaintext protocol format, transforming and making them available on text-based time-series in an endpoint that will be scraped by Prometheus.

The metrics sent to the Graphite Exporter are transformed to Prometheus’ format based on the mapping file (YML), which is a configuration file with the mapping patterns to identify the metrics and transform them in Prometheus format with labels.

The sink’s metrics that are available in the Spark codebase, are functions responsible to receive metrics events from the jobs and send them to another service (the Graphite Exporter for this use case).

We had thought in implement a plugin to a new custom sink in Spark to support Prometheus directly, but it would imply more development time, complexity, and non-compatible metrics depending on Spark versions. As Spark supports a sink to Graphite metrics and the Prometheus team has an Exporter for Graphite to Prometheus, this made us decrease the complexity and development time for this solution.

In the sections below, we have broken down the detailed roles of each service in this architecture.

Services

Apache Spark

Spark is an engine for large-scale data processing. Here, at QuintoAndar we used it to create our data pipelines, from the data extraction until the data is ready to be consumed by our Business Intelligence tools.

As cited in the Motivation section, these Spark Jobs are triggered by Airflow, a tool from Apache to orchestrate workflows.

Spark clusters are created and the jobs triggered by Apache Airflow

To know more about how we are using Apache Airflow at QuintoAndar to orchestrate our jobs, check this article written by our data team.

Graphite Exporter

Exporters are applications that can fetch metrics from non-Prometheus third-party monitor systems into Prometheus. For our use case, we needed Graphite Exporter to receive Graphite plaintext protocol metrics from Apache Spark clusters and transform them into Prometheus format metrics.

As mentioned before, to transform the Graphite metrics format into Prometheus format we need a mapping file with the instructions for the transformation. Here is an example of the mapping file:

mappings:
  - match: "*.*.*.executor.filesystem.*.*"
    name: databricks_executor_filesystem
    labels:
      app: databricks
      project: $1
      cluster_name: $2
      component: $3
      type: $4
      qty: $5

  - match: "*.*.*.jvm.pools.*.*"
    name: "databricks_jvm_memory_pools_${5}"
    labels:
      app: databricks
      project: $1
      cluster_name: $2
      component: $3
      type: $4

With the second match instruction in the YAML above, the following Graphite metric…

projectName.clusterName.driver.jvm.pools.Metaspace.usage

…will be transformed on this Prometheus metric:

databricks_jvm_memory_pools_usage{
  app="databricks",
  project="projectName",
  cluster_name="clusterName",
  component="driver",
  type="Metaspace",
}

The first asterisk (*) in the mapping file corresponds to the first tag ($1), while the second asterisk (*) represents the second tag ($2), and so on.

Prometheus

Prometheus is a monitoring system used to store, query (PromQL), and analyze metrics from other applications.

Using other tools from its ecosystem, we can also use Alert Manager to create alerts from metrics thresholds, Pushgateway to ephemeral jobs (running for a short time), Exporters to retrieve data from other non-Prometheus systems, Grafana to create dashboards based on metrics, and Prometheus UI to visualize and query metrics ad-hoc.

Prometheus pulls metrics from Graphite Exporter

It can scrapes metrics directly from a target or use an intermediary Pushgateway, especially for short-lived jobs. Prometheus can also export its metrics to other applications, such as Grafana or other compatible data visualization tools.

In our architecture, Prometheus has the responsibility of pulling the metrics from Graphite Exporter and making the metrics available to query from its UI, other visualization tools (e.g. Grafana), or alert tools (e.g. Alert Manager).

To know more about Prometheus and its ecosystem check the official documentation.

Alert Manager

Alert Manager is a Prometheus component designed to handle and manage alerts from Prometheus. The main function of Alert Manager is to receive the alerts and processed them based on predefined rules.

With that, we can create alerts derived from metrics and be notified when certain metrics exceed their threshold. At QuintoAndar, we have a custom service that encapsulates Alert Manager and makes it easier for tech teams to create their monitoring alerts and notified them on Slack or even Google Chat.

Grafana

Grafana is an open-source visualization platform focused on monitoring and observability. There are a set of tools for real-time infrastructure monitoring.

Grafana offers extensive support for various data sources, including Prometheus. By establishing a connection with the Prometheus server, Grafana enables seamless querying of metrics and facilitates the creation of application dashboards using the data obtained from Prometheus.

By leveraging Grafana’s interface, users can effortlessly design and customize visually appealing dashboards that showcase real-time and historical data from Prometheus.

Conclusion

By adopting this architecture for monitoring the Spark cluster, it becomes feasible to centralize metrics and maintain a historical record of them.

This centralized approach empowers users and teams to proactively identify issues, optimize jobs, and troubleshoot problems related to disk, memory, and JVM efficiently.

Moreover, we are using specific key metrics to optimize resource allocation and reduce costs in our pipeline. This optimization allows us to replace underperforming clusters with more efficient ones, resulting in reduced job execution time. As a result, our pipeline operates more effectively, providing improved resource utilization and overall cost savings.