Sinch Blog
Published in

Sinch Blog

Improve Visualizations by Filling Gaps

In our previous article we discussed how to calculate the Retention or Churn of our products.

In this article, we’ll learn how to to improve our actual visualization and get better insights from the previously calculated Churn. This is going to be done by using SQL to fill the months’ gaps where there isn’t any information. So, if you have missing dates or missing series of numbers, this is when generating catalogs can help you. In our case, we’re developing an array of dates with BigQuery and we’ll use it to fill those gaps.

This one uses the result. You can see we are lacking some months, which is plotted on the dashboard

OUR ENVIRONMENT

We’re going to work with Big Query and DataStudio, due both of them are integrated.

The table was already created when we obtained the Retention (check the previous article), but we’ll apply a minor change, adding one more column. This is going to be “my_retention_table”

my_retention_tableSELECT
contracted_month,
months_to_cancel,
LEAD(months_to_cancel,1,1000) OVER (PARTITION BY contracted_month ORDER BY months_to_cancel ASC)
next_months_to_cancel,
customers_at_month,
customers_cancelling,
retention_rate
FROM churn_table

Let’s explain our columns

  • contracted_month: month of contract to be evaluated
  • months_to_cancel: number of months that have happened to have customers cancelling
  • next_months_to_cancel: this is using a window function to get the lead month and so we can stablish a period (if there are no months to evaluate, a high number is set)
  • customers_at_month: number to catch the total of customers that we had at that month
  • customers_cancelling: as a ladder, we will see the number of customers cancelling our product across the time (the time happens seeing the months_to_cancel)
  • retention_rate: calculated using customers_cancelling / customers_at_month

This is an example of the table:

THE CODE

Now, to prepare our results, let’s do it in 3 steps

STEP 1: All our dates

With this, we obtain every month since 2015 and the difference in months between that specific month and the actual one.

All our dates
SELECT
generated_month,
DATE_DIFF(CURRENT_DATE(), generated_month, month) id_months
FROM UNNEST(
GENERATE_DATE_ARRAY(DATE(‘2015–01–01’), CURRENT_DATE(), INTERVAL 1 MONTH)
) as generated_month

Let’s take an example row:

The generated_month is ‘2021–03–01’, the id_months is 16 due the difference between March 21 and July 22 are 16 months.

STEP 2: Our Dates Table

We want to use a self-join to get every month against every possible difference of months. However, for this specific case, we don’t need every joined field. We only need the differences whose current month is equal to or higher than that month.

Our Dates Table      
with
months_ids as (
SELECT generated_month, DATE_DIFF(CURRENT_DATE(), generated_month, month)
id_months FROM UNNEST(
GENERATE_DATE_ARRAY(DATE('2015-01-01'), CURRENT_DATE(), INTERVAL 1 MONTH)
) as generated_month
)
SELECT months_ids.generated_month, B.id_months
diff_months FROM months_ids
JOIN months_ids AS B
ON months_ids.id_months >= B.id_months

Let’s check the example:

The generated_month is ‘2021–03–01’, we’ll get 17 rows, and that’s exactly what we need: every month against every possible month of difference.

STEP 3: The result

Now, we merge both tables, the one with all the dates and the retention table, considering:

1. If there’s no match, we set the retention as 1 (100%);

2. If there’s no match, there were no customers cancelling, then, we set 0;

3. The period created in the retention table is used in here to help match the months.

Result
with
months_ids as (
SELECT generated_month, DATE_DIFF(CURRENT_DATE(), generated_month, month) id_months
FROM UNNEST(
GENERATE_DATE_ARRAY(DATE('2015-01-01'), CURRENT_DATE(), INTERVAL 1 MONTH)
) as generated_month
),
dates_table AS (
SELECT months_ids.generated_month, B.id_months diff_months
FROM months_ids
JOIN months_ids AS B
ON months_ids.id_months >= B.id_months
)
SELECT
A.generated_month contract_start_date,
A.diff_months months,
IFNULL(B.customers_cancelling,0) customers_cancelling,
IFNULL(B.retention_rate,1) retention_rate
FROM dates_table A
LEFT JOIN my_retention_table B
ON A.generated_month = B.contracted_month
AND A.diff_months >= B.months_to_cancel
AND A.diff_months < B.next_months_to_cancel

This helped us reach our goal: a table without gaps between months. Below, I’m showing only the sample of that date 2022–03–01.

And now, it can be plotted on any visualization tool. This time we’re using DataStudio, and you can see now how it is easier to visualize. This is done by using this query as the datasource and then choosing Pivot Table with Heatmap. Choosing the contract_date as the Y Axis, the difference in months as the X Axis, and the retention as the Measure.

Now, it is up to you to customize the results.

CONCLUSIONS

Explore every tool to make your life easier, and in the process make other people’s lives easier. In this case, we needed to do some transformations of the original table to prepare it before draw it. Yes, that consumes time — but in the end, it’s better to understand your data sources and possibilites, so the results are worth it.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Sinch

Sinch

210 Followers

Follow our publication to see stories about technology and culture written by Sinchers! medium.com/wearesinch