Data metrology at BlaBlaCar
Our tool to monitor data consistency
Why do we need a tool?
When you are a data-engineer, one of the most important topics is the health of your data. You must provide clean and reliable data to your consumers.
When the data is missing, decision makers are blind. Even worse: when the data is not consistent, wrong decisions will be made!
There are many things that can lead to “dirty” data:
- The data-source is not ready yet
- Something has changed in production
- An external API has been modified
- A technical error occurred
Most of the time, you know that something happened, because errors are visible in logs:
But sometimes the error is silent and you are not aware of it, unless you actually check the data.
That’s why we decided to implement a metrology tool, that would run automatically and allow us to detect issues in a glimpse, every day.
How does it work?
We defined a list of metrics that we wanted to monitor. Some of them are functional: the number of signups, the activity on the platform. And some of them are technical, such as the number of rows per tables.
We started to record these metrics every day. The next step was to detect weird values, such as suspicious drops or peaks. Our first idea was to compare the values with the global average.
The last step was to define the “acceptable” range. The statistical outliers are considered as suspicious and need to be checked.
This setup worked well at the beginning, but we quickly faced an issue. Due to the variation of activity during the week, there’s a huge gap between the average and the daily value.
It’s almost impossible to tell the difference between normal variation and technical issue. We needed a smarter way than comparing with the global average. We decided to compute the average on the same day of week, to take into account the cyclic activity.
It allows to detect weird values easily.
We also implemented an hour-based system for the trackers data, which needs to be monitored more precisely.
We wanted to keep it simple and efficient. We started with a generic table to store all metric history. The average and delta are pre-computed.
The calculation of average and delta is always the same, whatever the metrics. For this reason, we decided to write a query with a dynamic parameter. It allows to compute all the metrics within a single python loop.
We used Tableau to build a dashboard on top of this. We review the dashboard every morning. The first tab has been designed to check the status of all metrics.
The other tabs display the details for each metric.
Monitor the volumetry
Last but not least, we needed to follow the database volumetry on a weekly basis. We wanted two things:
- Two levels of details: schema and table
- Two metrics: the global size + the size evolution compared to previous week
The dashboard is split into two parts:
- The top panel indicates the evolution, allowing detection of big variations.
- The bottom panel indicates the whole size.
The metrology tool helped us to improve data quality at BlaBlaCar. It’s been designed according to our needs:
- All the metrics are refreshed automatically
- The process to add new metrics is very simple, thanks to the generic structure
- Issues are highlighted by computing delta between daily values and average on the same day
After implementing this tool, we noticed a global improvement on the data reliability. We anticipate more, we communicate issues to analysts, and we can fix them quicker.
The next step would be to fire alerts when there’s an issue, so that we can be even more reactive.