Making your data analysis efficient with the PinPointer dashboard

Amit Vainer
ironSource Tech Blog
5 min readSep 18, 2019

Almost every data analyst opens his morning with some coffee and graphs. He checks that everything is aligned and nothing is broken. In case it isn’t aligned he start with some manual checks. The first few steps are usually the same -take a look at a few breakdowns and find the one with the biggest impact. It can be very frustrating pointing an issue. Sometimes these manual steps can take several hours and all you wish for is an automated solution that will just point you in the right direction.

Two weeks ago, I was watching another Machine Learning video towards our anomaly detection solution. In this video, one of the rules was: ‘you need to understand that a graph isn’t a collection of dots on two axes but a function based on variables.’ -Then it hit me.

Let’s write a query that checks all the breakdowns together, calculate the weight of each one and show us the breakdown with the biggest impact.

You probably think “Hey, this is why we have machine learning and anomaly detection”. Well, you are right. But for this to work we need time, learning and a lot of iterations. With this query, I can make my day much more efficient in a minimum effort and point the investigation in the right direction pretty quick.

The concept

So now that I have the idea, I need to define my function and my variables. The concept is simple. Each KPI is a function and each breakdown will be the variable. The query will check each breakdown and its share. Then it will order the breakdowns from the strongest impact to the weakest.

The only step remains is to give it a name - We call it The PinPointer.

For example, let’s take users impressions. When we look at the users impressions trend we usually see a typical cycle. During the week, we have a high trend and during the weekend we have a lower trend. If we have an anomaly in these trends, what would be the reason? Well, the first answer is, “I don’t know.” It could be a holiday, a bug in one of our products, a server error and more. For this example, I’ll choose breakdown by country, product and operating system. How do I know what impacts the most? On a normal day, I would check them one by one with our dashboards or write a query, hoping that the first breakdown leads to the answer and not the last. This kind of check can take between 10 minutes to a few hours and this is only the first step. With the PinPointer, all I need is to execute the query and check the results.

The Query

So far, we have the function (users impressions change) and the variable (country, product, and operating system). Now it’s time to do some math. We will use our country breakdown to explain it:

  1. Total change — the total impressions today divided by total impressions on the same day one week ago. For example, we have a decrease of -13% in users impressions Worldwide.
  2. Country-change — take each country and calculate the change from last week. This breakdown will show how much decrease in users impressions we got from each country.
  3. Country-share — each country has its own weight in the total KPI. If we have 1M impressions in a day and in US only we have 500K impressions than US share is 50%.
  4. Country-Weight — multiply country-change with country-share to get the impact on the total change.

So if US share of total impressions is 50% and the decrease in US impressions is -20% then the effect of US decrease on the total change is -10%.

Now we need to order the list from the biggest weight to the smallest and that is how we will get to the root of the problem really fast. We will do the same drill to each breakdown and put them all together in the same output ordered by the biggest change.

Tips for your PinPointer:

  • In order to show different changes from different breakdowns in the same table, you will need to use an indicator to flag the breakdown and then use UNION to add everything together. A quick example for that:
with country as(
select 'country' as indicator ,date, cc as name, SUM(impressions) as impressions
from table
group by 1,2,3
order by 2,4 desc
)
, product as(
select 'product' as indicator,date, product as name, sum(impressions) as impressions
from table
group by 1,2,3
order by 2,4 desc
)
select *
from country
union all
select *
from product
  • Windowing — OVER clause and PARTITION BY will help you calculate shares for each row in the breakdown
  • Separate between increase and decrease — if there was a decrease in impressions we would like to see the group that has the most significant impact on the decreased trend
  • Order your list by 3 dimensions — indicator, increase/decrease, and weight. This way, you will get the most significant impact for each indicator at the top of the chart.
  • Placeholders — a great tool to make your PinPointer flexible and comfortable. You can use it to choose a date frame, other breakdowns, etc

A glance at one of our PinPointers:

The PinPointer Dashboard

In this case, we can see a drop of 63% in Indonesia’s impressions. A quick search online will tell us that on June 3rd there is a holiday there.

I just want to clarify that the pinpointer builds on a change in one of your defined KPIs and a drill down to it. In case the change isn’t significant in the KPI or in one of your breakdowns than you might have a different problem and you need to dive deeper into the data. One possible solution is to cover each change with alerts — but this is a different tool on a different post.

One more issue that can occur in the pinpointer is if your data trend isn’t stable but very volatile. In that case, you need to find a way to normalize your results.

Summary

Deep data analysis can take a long time. The big analysis tasks will usually be interrupted by these daily irregularities. It can consume your time and disrupt your sprint. All you want to do is get a quick answer to that anomaly, point the developers and product managers on the right issue and get back to your main tasks. Everyone knows that machine learning is the next big step and it will help you with these anomalies but for it to work perfectly we need more time and more learning. Until then, we need to make our day more efficient. Our data team does it with The PinPointer dashboard.

We made a data frame for you to play with. You can find it in our Github.

Enjoy!

Written and developed by

&

--

--