Streaming data for brewery ops with Grafana

Driving data to Grafana using Apache Kafka, Apache Flink, and SQL Stream Builder

kennygorman
Cloudera
6 min readSep 8, 2021

--

Photo by Carlos Blanco on Unsplash

If you are a data professional — data engineering, data science, or DevOps you will have likely heard of Grafana. For good reason, it’s an excellent mechanism to display complex time-series data in various useful ways. Its weak spot lays in the ability to manipulate the firehose of incoming data any modern and moderately complex system throws off in order to make those squiggly lines make sense and not be too much useless data.

In this post, I will discuss how to connect SQL Stream Builder to Grafana to tame the firehose of data coming from Kafka and make building high-performance Grafana dashboards a snap.

Fermentation as telemetry

In order to show Grafana in action, we need some data as an example. Take the process of brewing beer. For the uninitiated, there is a hot side and a cold side to the process. The hot side is steeping grains to form wort (pronounced like “wert”), then boiling it to get a sugary solution that is, essentially, yeast food. Next is the cold side, where you store the wort and keep it temperature controlled for a period of time while the yeast consumes the sugars, throws off alcohols, and creates beer as we know it. This phase is known as fermentation. Fermentation is where everything can easily go wrong — it’s also the hardest part for consistency in brewing, and consistency is everything. So, naturally, we look at data to help us understand this cycle to create a consistent and wonderful beer.

“Beer does not make itself properly by itself. It takes an element of mystery and of things that no one can understand.” –Fritz Maytag

In this example, I am going to be showing telemetry for the fermentation process (the cold side) from a typical commercial brewery. We will be monitoring temperature, both in the vessel and outside (ambient), as well as the specific gravity of the wort itself. We will be performing a couple of aggregations and then presenting that data into Grafana via SQL Stream Builder. The telemetry will give us the ability to:

  • Ensure we maintain proper temperatures so we don’t ruin the batch
  • Know when the fermentation process is complete, and we can keg and carbonate it (the next step in manufacturing beer)

Here is a high-level architecture diagram for the streaming data flow:

Let’s break down these components a bit..

Sensors

The devices used are hydrometers that are specialized submersible sensors that float in the wort to capture temperature and specific gravity readings. Yes, they are floating. Yes. A good example is the Tilt hydrometer that captures both temperature and SG over Bluetooth via Rasberry PI box.

An example data payload is serialized in JSON might look like this:

The important keys are the id (a canonical name), the temp, the timestamp (time since Epoch), and the specific gravity reading (sg).

Data is pulled from the sensor and the Kafka producer API is used to push data downstream.

Apache Kafka

There isn’t anything magical about this step. Using Apache Kafka is just a nice simple way to capture a boundless stream of data in an append-only manner and make it easy to scale, consume, etc. The sensors read once a second (1Hz), so the data load isn’t massive until you are a very big commercial brewery. That said, this data is very very valuable. This serves as a data bus for the entire organization allowing various consumers (not just the Brewmaster use case shown here) to get value from the data. For example, perhaps teams are also preparing for when batches of beer are complete in order to maximize the supply chain. Apache Kafka is a great way to organize and scale the data.

Continuous SQL/Materialized Views

Now, here is where the magic happens. Because you are consuming a stream of data, you need a way to materialize the latest state of a stream by key. Grafana itself isn’t capable of handling things like a raw message stream, or late-arriving data — it expects a simple lineage of data already organized by time. It also gives an opportunity to aggregate by timestamp so Grafana has views organized over time, but without all the detail — this is a massive performance boost. SQL Stream Builder uses Continuous SQL and Materialized Views to easily clean, aggregate, and organize the data for presentation to Grafana.

Schema

First, we must create a schema for this data. We create a new Apache Kafka table and use the SQL Stream Builder Detect Schema functionality to create a schema from the JSON data.

Create Schema in SQL Stream Builder

Timestamps

Next, we ensure we take advantage of the capability of SQL Stream Builder to utilize the timestamp in the data for the timestamp in SQL operations. In this case, the ts attribute. You can go deeper into timestamps, event-time, processing-time, and table configurations in the SSB documentation. Here is how we configure the system to understand our timestamp attribute:

Specifying a timestamp from the data payload

Create a SQL job

Now we need to create a continuous SQL job. This job will continuously process the data into the aggregation, and into the materialized view. This is sometimes called a “push query”. Its job is to tame that firehose of incoming data into useful buckets.

We will create an aggregation, using 10-minute tumble windows, and perform AVG() on SG, and MAX() and MIN() on temps.

Configuring your Materialized View definition is key in successfully getting Grafana dashboards. In order for the data to be useful for Grafana (timeseries), we will want to ensure we design our materialize view primary key to include both time and a logical key. In our case, because this is time-series data, we concatenate two keys together for the primary key.

The aggregation query in SQL Stream Builder

We specify the parameters of the Materialized View in SQL Stream Builder and hit execute. The data will be rolled up and saved in the Materialized View, and we access it from Grafana via “pull query”.

Dashboards!

Now we are ready to build a dashboard for the data, we connect to the Materialized View and perform “pull queries” to organize the data into different logical views that make sense to the business. In this case, the Head Brewer will be obsessing over these charts day and night until fermentation is complete. So we want to show a time series of temperature and specific gravity, and we also want to create some panels that represent key alarms should something go wrong.

To create a new time-series graph we add a panel, and manually add the SQL for the query:

We get a time-series graph like this:

We can duplicate it for specific gravity:

Then we can create individual alarm panels to bring the whole thing together:

There you have it, end to end from the raw sensors to the real-time Grafana dashboard. The head brewer can see the critical telemetry from each sensor in a usable way, and can instantly react if something goes wrong. Happy fermenting!

If you like this kind of data processing architecture and want to get your hands on SQL Stream Builder, you can check it out on the Cloudera Streaming Analytics page.

--

--

kennygorman
Cloudera

Product Management @mongodb, Previous Co-Founder @eventadorlabs & @objectrocket. Early @paypal. Views are my own.