Enriching Looker with reliability metadata

Published in

Vio.com

7 min readApr 26, 2022

About data reliability

As companies become more mature in their data practices, the number of systems dedicated to fulfil their data needs becomes larger and larger. Not only will there be more data systems, but they will also become more complex. And with more complex systems, the chances that something goes wrong will grow tremendously.

For a data driven company, trusting its data is key. You cannot base your decisions on data that might be misleading. Imagine the alarms triggered by your company’s CFO after seeing in the quarterly dashboard that you are not making even half of the forecasted revenue. And then imagine the faces of the data team after finding out one week later that a data pipeline was producing incorrect data for the last month. Well, the CFO would be relieved, but they’ll probably need a vacation after all that stress.

Data reliability, sometimes also referred to as data observability, is the branch of the data practices that focuses on monitoring data systems to ensure that they are trustworthy. As explained in What is Data Reliability?, by Barr Moses, you can think of data reliability as you think of SRE (Site Reliability Engineering) for other software systems.

Accessing reliability metadata

The explosion of the Modern Data Stack these last few years has resulted in a plethora of tools in the data space, many of them focused on data reliability. You can choose between full-fledged SaaS tools like Monte Carlo, Datafold, Bigeye, Soda, etc., extensions to other tools like re_data or elementary which integrate with dbt, or even build your own.

Most of these tools store the results of the reliability tests in a database and expose them in a custom front-end application. These custom applications are very rich, show a lot of contextual metadata and allow users to understand very deeply the status of the different data products.

However, the need to access another tool can become a problem. Data consumers are usually familiar with the BI tool. Having them open the BI tool in one tab, the data reliability tool in another tab, and perhaps other tools (data catalogue, etc.) in different tabs, may have a bad impact in adoption and usage. What if we could expose the reliability metadata in the BI tool together with the data it contextualises?

Data reliability at FindHotel

At FindHotel, Looker is our BI tool. It is accessed daily by dozens of data consumers to make decisions about many different aspects of the company. We have a few data products, like for example our main booking and revenue dataset, that is used by many people to understand how different processes are going. This data product relies on multiple data sources, both internal and external, event and batch based.

As explained in the beginning, such a complex system with so many dependencies is very prone to errors. And usually these errors are spotted by the data consumers. The data team usually receives support tickets asking if provider X has stopped reporting, if predictive model Y is up to date and the likes.

Recently we decided to run a proof of concept on data reliability to test how it could help us tackle the challenges explained above. Since we use dbt for a large portion of our data pipelines, and after researching some of the data reliability tools introduced before, we decided to test re_data.

Exposing reliability metadata in Looker

One of the premises for our proof of concept was making reliability metadata accessible. We wanted it to be close to the data, to facilitate the use by data consumers. We decided to expose reliability metadata in Looker to contextualise the data products.

We installed the re_data package in one of our dbt projects and defined a few metrics, specially aimed at monitoring freshness of the main data sources. We also created some custom metrics to monitor freshness at a lower level of granularity. Then we created a job in dbt Cloud to calculate these metrics, which results in a set of models (tables and views) created in our Snowflake database.

First thing was to create a set of views and explores to start analysing the re_data models. For example, after re_data has been running for some time, it can detect anomalies, and we can check them in Looker:

Nice! So now we can use this data to build a dashboard for data consumers to understand alerts, anomalies and reliability metrics of their favourite data products:

Well, that’s nice. We have our reliability data in Looker, which is the tool our data consumers use to access the main data products. So that meets our requirement to make reliability metadata accessible. But, can we bring reliability metadata even closer to the data?

Bringing reliability metadata even closer to the data

In the previous section we explained how we brought reliability metadata into Looker and created a dashboard with it. This definitely avoids the need for users having to access multiple tools. However, they will still need to open at least 2 tabs in their browser to see data and metadata together.

We decided to use two parameters of Looker view fields, html and link. These parameters are defined in the LookML code of the views, they can be added to any field (dimension or measure) and they can be set dynamically taking advantage of Looker’s liquid variables. The html parameter can be used to modify the look-and-feel of the field as shown in the browser and the link parameter can be used to add navigation features.

Considering all this, we did the following:

We created a custom view on top of the re_data_last_metrics table called re_data_aggregated. The reason to create this custom aggregated view is that, originally, the re_data_last_metrics table had one row per column and metric, but we wanted to have just one row per table, to be able to join it without creating duplicates.

We added a join in the main Looker explore to this aggregated reliability view. This way we can access the reliability metrics from it and in the source view of the explore. The join is very simple, we use the table name, which happens to be the same as the explore name.

In the main view we created auxiliary hidden dimensions to access the reliability metrics that we wanted. For example, here we used a case statement to access custom freshness metrics per provider.

We also created auxiliary hidden dimensions to define alert levels based on the freshness metrics. For example, here we used red colour for data not updated in the last 2 hours, orange if not updated in the last hour, yellow for the last half hour and green otherwise.

We used these auxiliary fields to construct the html parameter of the actual dimension that we are monitoring. We added a span element with a round unicode character, which we coloured based on the level of alert, and we also added a title that would be shown as a tooltip when hovering over the circle.

Finally we added a link parameter to navigate to the reliability dashboard that we created in the previous section and customised it with a nice icon.

The end result is the following:

We see a coloured circle next to the dimension that gives us an idea of the freshness of the data for that particular provider. At the same time, if we hover over the circle, we see a message with details about the freshness metric:

When we click on the 3-dots icon next to the dimension, we see a link that allows us to navigate to the reliability dashboard directly from the explore:

Conclusion

Data reliability and observability are still a relatively new field of the data space. They try to solve a problem that has been ignored for too long, monitoring data systems to provide meaningful information about their status and increase trustworthiness on them.

However, this can be a double edged sword. In the end, the data landscape is so broad and unbundled that a company needs to be cautious about adding more tools to their data stack. Otherwise data consumers may end up looking at many different tabs in their browser to understand the full picture. And that can have a negative impact on adoption and usage.

Enabling data consumers to access data and metadata (not only about reliability, but anything you can imagine) together is key to create a data platform that is both user friendly and useful. Here we have explored how we can do that in Looker, thanks to its high degree of customisation. But you can apply the same idea to any other BI tool or data system.

By the way, if you like the topics we discussed and are interested to work in a company where data is our core and which is bringing its data capabilities to a whole new level, don’t hesitate to have a look at our openings.