Grafana vs. Prometheus Agent

Daniel Weinshenker
4 min readMay 1, 2023

--

TL;DR

  • The Prom agent mostly utilizes the same implementation of the Grafana agent and imports the metrics-specific solution from it.
  • The Grafana agent is component-based, meant to address a number of observability use-cases and providing a more complete solution for collecting different types of observability data. It also has a “flow” mode to connect components and a UI for debugging them.
  • The Prom agent still uses the same binary as standard Prom, so this includes components such as a UI (with some pages disabled) that shows scrape target status, config, and other important metadata.

Context

Metrics collection and usage at scale has been an increasing challenge that many developers and organizations face when scaling their services. Metrics often form the first line of defense when evaluating system health in real time, so it is important to be able to collect all relevant metrics and query/alert on them.

Usually, one starts collecting metrics using the most popular open-sourced tool: Prometheus. It has become a de facto standard for metrics collection given its easy setup, all-in-one-solution, and many important features for microservice environments like service discovery.

Prometheus BA (before agent)

The diagram below shows standard Prometheus architecture. It is able to perform many functions that make it an all-in-one-solution.

  • SD/apps discovery => Discover targets to scrape via various methods.
  • Scrape => Scrape metrics targets and store data in the TSDB (time-series database).
  • Record => Generate new time series using data from the TSDB.
  • Alert => Start/stop alerts based on metrics queries.
  • PromQL / Exemplars => Allow clients to query the TSDB via the PromQL language.
Source: Prometheus

However, as the amount of data collected and queried increases (i.e., more targets, more data per target, etc.), Prometheus only scales vertically given all the state it retains. This makes it a large single-point-of-failure and could result in:

  • Increase compute/storage costs (needing larger instances)
  • Longer recovery times due to replaying write-ahead log (WAL)
  • Data loss!!!

Enter “agent” mode to mitigate this problem.

History

centuries ago … jk just a few years ago

Grafana announced their agent back in March of 2020 and as of November 2021 it was donated upstream to Prometheus and made available under an experimental flag. Agents allow for scaling out observability data ingestion horizontally since they mainly ingest and write their data remotely as opposed to serving queries (which necessitates having a TSDB and serving read traffic).

Prometheus AA (after agent)

Imagine having a separate service for scraping metrics vs. one that queries/stores/alerts on those same metrics. An updated diagram of what Prom agent mode architecture looks like is below. The Prom agent only discovers targets, scrapes them, and remote writes the metrics to another system that is used for storage/querying/alerting/etc. This decouples collections from the other functionality of Prometheus.

Source: Prometheus

The Grafana agent metrics-component performs essentially the same functions as the Prom agent.

Why use one over the other?

The Prom agent was inspired by the Grafana agent and mostly takes the code related to metrics functionality directly from it. As a result, for the metrics use-case there isn’t a big difference between the two. However, there are some differences outlined below:

Consider the Grafana agent if:

  • You want to also use it for collection/forwarding of traces + logs, not just metrics.
  • You want to be able to send data to OpenTelemetry (OTel) systems, not just Prometheus based ones.
  • You want to have more granular control of the agent’s components with rich UI debugging capabilities.

Consider the Prom agent if:

  • You are focused on only metrics observability data.
  • You wish to switch from using standard Prometheus to Prom agent quickly (just need to run Prometheus with--enable-feature=agent ) after ensuring you have the right version.

Further exploration

In addition to the Grafana and Prom agents, the OtelCollector is another tool that allows agents to be installed that collect metrics in many formats such as Prometheus-based formatting and translate them into a vendor-agnostic standardized format. This is another exciting development and provides additional means of collecting data and forwarding it to a variety of backends.

Hope you enjoyed the post!

--

--

Daniel Weinshenker

Programmer, curious “what if” guy, Matrix enthusiast