Pipeline Progress At-a-Glance

Dan Kleiman
Flipside Crypto
Published in
5 min readSep 6, 2019

When we on-board a new partner at Flipside Crypto and start parsing their blockchain data, my curious colleagues will often check in on our ingestion progress.

Since I spend all day writing queries in the SQL console, my instinct is to pull out my go-to monitoring query, run it, and send a code snippet of the query results to them over Slack.

Oh, and I obviously include the SQL itself, because I want them to have the right context for interpreting the results…

But as my gracious co-workers have ever-so-gently let me know, this is a sub-optimal way for them to receive updates.

Who knew eye rolls were audible over Slack?

Show, Don’t Tell

What if, instead, they had access to a visual status of the pipeline? Something that they could glance at and have an intuition about our progress or drill down into to explore more?

Something like this:

Ingestion Pipeline Snapshot

The chart above shows progress on our UDM Event Ingestion. “UDM” stands for Universal Data Model. At Flipside, we use this model to compare data across blockchains. I wrote about how we do that more here.

How to read the chart:

  • the x-axis shows how far behind the “Last Block” that we processed is, i.e. how many minutes behind are we on a specific blockchain?
  • the y-axis shows how far behind our “Last Insert” is, i.e. how many minutes have passed since we inserted data for a specific blockchain?
  • the size of each dot is the number of UDM events we have inserted since the beginning of the hour

We’ll get into how to interpret these results below.

To build the chart, we use an out-of-the-box scatter plot in Mode, with drag-and -drop columns and attributes and easy-to-edit labels and interactive filters.

The workflow in Mode is:

  • write SQL
  • inspect results
  • build charts
  • embed interactive charts, results, and text in an HTML page with an editor pretty similar to the one I’m using to write this post on Medium

The way these stages are seamlessly linked together is a massive improvement over copy/paste results sharing.

But most importantly, my coworkers can access richer data and context and explore it whenever without waiting on me to run another query for them.

More Layers of Information

My original monitoring query would produce results that looked like this:

Query Results for Ingestion Progress

From these results, I learned how to extract a few different layers of information:

  • How close are we to real time? look at the lag between latest_block that we’ve parsed and when we’ve last_inserted data for the chain.
  • Are we in catch-up mode? look for a larger than expected event_count or a large delta in the timestamps, where our insert times are current.
  • Has something happened to our nodes or parsers? if a chain you expect to see in the results is missing, go upstream.

When I translated this query to a visual, though, having an intuition about different statuses became a lot easier.

Compare similar results in chart form:

Understanding More, Faster

In the chart above you can see that several chains are clustered around the Real Time Line, meaning the last block time and our last ingestion time for the chain are almost the same.

Also, the chain that is currently in Catch-Up Mode is a big obvious point, shifted to the right. The large size comes from the high number of events we’ve ingested in this window, because there is more chain for us to process. The shift to the right comes from lag in “latest block” that was processed. We are further back in their history, so there is more lag when that block was published and the time we are processing it.

And again, you can use the an interactive sidebar on the chart to toggle on and off different points, if you want to see any one more clearly.

Tracking Over Time

While any one report will give you a quick snapshot of progress, like all good computer tools, you can maniacally refresh the results too.

Real Time Updates with Rapid Refresh

While not much changes second-to-second in our pipeline or on the chains themselves, we recently upgraded a parser for some of our chains, and decided to replay a small portion of each chain to ensure that we didn’t miss any data during the migration from the old parser to the new parser.

Over the course of the hour that we did the migration and replay, you can clearly see the backfill moving closer to real time.

Follow the Big Green Dot:

Right-to-left movement of the dots, indicating catch-up in progress.

Each of the frames in the gif above are minutes apart. You can see the Big Green Dot moving through hours of data, but also notice the smaller replay chains moving from right to left and from up to down as we scaled up the ingestion process to meet the replay throughput.

Leveling Up Our Team

You can see in the examples at the end of this article how important the right visual is when it comes to explaining data to our clients.

Lately, I’ve been learning that empowering our internal teams to make sense of pipeline and process data is equally important.

We can get answers for our clients faster and speak more authoritatively about our data when all stages of data processing are easily accessible and more readily understandable.

Having the right tools to quickly move from queries to reports makes this process easy. Reporting building should become a standard part of our engineering workflows going forward.

--

--