How to Set Up Customer Retention Analysis in Looker

Ying Ma
Ying Ma
Jan 6 · 9 min read

Retention is a metric to understand how many users return to your platform after the initial conversion. Retention is commonly used as a long-term health indicator in product development and improvement. Additionally, by examining how retention varies by geography, gender or other behavioral characteristics, the business gains better understanding of the contributing factors for retention and can shape data-driven business strategies.

Retention is usually reported as a percentage of users who return to the platform, which generally declines over time and can be charted as a retention curve. The figure below is an illustration of the concept: each trace represents a cohort starting at a different point in time, the x axis is the duration for how long the cohort has been alive and the y axis is the percentage of users that retained.

Image for post
Image for post
Examples of retention curve. Figure from https://help.looker.com/hc/en-us/articles/360023686854

Even though the cohort retention can be calculated with SQL, reports with slicing and dicing the cohort by other user attributes requires much more effort to maintain. For example, one of our business stakeholder wants to further investigate into a certain cohort, and asks questions like: “What is the impact of acquisition source on the retention for the 2019 Q4 cohort”? We want to take advantage of Looker’s flexible and customizable structure and design an analytical framework so that our business stakeholders can perform self-serving retention analysis by slicing and dicing the cohort with ease.

Looker’s Help Center provides an example of achieving this functionality by generating a derived table using LookML pseudo-code; unfortunately, this approach can be computationally heavy for large dataset. Here we are going to provide a much more straightforward and computationally tractable way of building a highly customizable and scalable retention Explore in Looker.

With this solution for retention analysis in Looker, we are able to:

  • visualize retention curves of cohorts
  • slice and dice the cohort with ease by any dimensions available (or that we add at a stakeholder’s request)
  • evaluate retention with various metrics
  • not need to manage/synchronize any underlying pre-calculated tables

Manipulation of these Looker retention explores is intended for our quantitative power users who are comfortable with writing Looker table calculations. We introduced a quick-training program (30 min sessions) to onboard interested parties onto this solution along with business-user-friendly documentation for more nuanced usage.

3 Key Components of Retention Analysis

Before diving into the technical implementation in Looker, we need to get aligned with some key concepts, making sure we’re all looking at retention with the same lens. The retention curve usually requires 3 key components to calculate: cohort definition, recurring events and evaluation metric.

Cohort Definition

The first question we want to ask is which event should be used to define the very beginning of member’s journey with your application, such as by member’s first website browsing, first time they put down their emails, first time they made a purchase, first time they signed up for subscription, etc.

This definition could vary according to the business model, but each member should have only one immutable timestamp as users’ starting point. The members who started around the same time and have similar attributes (such as belong to the same age group or come from same acquisition source) are grouped together as a cohort and it will be plotted as a single line in the retention graph. In here, we assume you have a table called Users that contains users’ sign-up date as well as some user attributes. The table may look like something like this:

Image for post
Image for post
Table Users

Recurring Events

Retention curves are essentially the chart of the frequency of the recurring events. The recurring events refer to the repeated interactions between the platform and members, and depending on the business model the recurring events can be defined as site visits, user logins, purchases, subscription renewal, etc. The time lapses from the starting of the cohort to the time of the recurring events are calculated as days/weeks/months and plotted as x axis in the retention graph. Assuming we are building a retention analysis for an online merchandiser, the recurring events will be the orders. The table Orders is something like this:

Image for post
Image for post
Table Orders

Evaluation Metric

Lastly, we want to define how do we want to quantify the retention. The most straightforward way is to track the number of the recurring events within the cohort over time. Other metrics such as revenue retention and user number retention can be calculated in this framework as well. Within Looker’s framework, the evaluation metrics can be configured as Measures which serve as aggregated calculations similar to SQL. Note that these calculations can handle straightforward logic such as and at a certain time point. Metrics with a spanning time window, such as defining active users as those who have purchased within last 3 months, are more complicated to be calculated and are not in the scope of this framework.

Looker Implementation

In order to visualize the retention curves in Looker, let’s reverse-engineer the key elements from the this Look:

Image for post
Image for post
Please imagine that this is a Looker screenshot (we have to use fake data in these posts). Please apply Looker’s color scheme when viewing this — green columns are table calculations.

This table reports the order number retention rate of the two acquisition sources of interest, and , with users who signed up during 2019–09 ~ 2019–10. The table is pivoted by user’s signup month. In Looker, the Visualization function uses row header as x-axis and plots each columns as a separate group. Therefore any dimensions used to define a cohort need to be pivoted as a column. Orders.months_alive is the time difference between Users.signup_date and the Orders.order_date. Number of Orders is the distinct order count in the specific month from all members in the cohort. For example, cohort has 150 orders at Month 1 and that represents all the orders made in 2019–10 by members who started during 2019–9. The Order Retention is calculated as the Number of Orders ratio relative to Month 0. The raw Number of Orders varies greatly month over month and it will be hidden from visualization.

Join the Data By Joining View

The base view of the retention analysis is the Users view and the Orders is left joined by user_id. Joining the tables can be handled in Looker elegantly by joining the two views:

explore: retention_analysis {
label: “Retention Analysis”
from: users join: orders {
sql_on: ${users.user_id} = ${orders.user_id} ;;
relationship: one_to_many
}
}

Users View

User’s signup date will be used as cohort definition. In here, we will make use of Looker’s timeframe feature, which help us parsing the signup date into different level of granularity. For example, if cohort_month is used as cohort definition and pivoted in the report, all the members who signed up within the same month will be grouped together as a cohort.

view: users {
sql_table_name: users

dimension_group: cohort {
sql: ${TABLE}.signup_date ;;
type: time
timeframes: [date, week, month, quarter, year]
}
############## other dimensions ###############

Additionally, a measure for number of users can be setup in the Users View. We use as the measure type, provided that the user_id has been configured as the primary key of this view. This Measure will be used to calculate user number retention. It’s worth noting that this measure should not be defined in the Orders View, because not all the users will have an order and counting it in the Orders View will result in under-counting.

measure: number_of_users {
type: count_distinct
sql: ${TABLE}.user_id ;;
}

Orders View

In the Orders View, we want to calculate the time difference between the signup date and the order date. This calculation can be embedded into Looker’s dimension using customized SQL code. In our SQL dialect, Snowflake, calculates the time difference between two timestamps based on the date part requested. In here, we support several options including Months Alive and Days Alive, and the users can drill into the retention analysis with various levels of temporal resolution.

view: orders {
sql_table_name: orders
dimension_group: order_date {
sql: ${TABLE}.order_date ;;
type: time
timeframes: [date]
}

dimension: months_alive {
sql: datediff('month', ${users.cohort_date}, ${orders.order_date_date}) ;;
type: number
}
dimension: days_alive {
sql: datediff('day', ${users.cohort_date}, ${orders.order_date_date}) ;;
type: number
}
############## other dimensions ###############}

The last thing we need to configure is the metric of retention, which can be realized by Looker’s Measures. The aggregation of order count and revenue can be defined with LookML easily:

measure: number_of_orders {
type: count_distinct
sql: ${TABLE}.order_id ;;
}
measure: revenue {
type: sum
sql: ${TABLE}.price ;;
}

Craft the Look

Now that all the key components have been configured within Looker, we can start to build the Look for retention analysis. To craft a Look similar to the previous example, here are the steps to follow:

  1. Find dimension group in Users View, pivot option.
  2. Pivot in Users View, filter this dimension to the items of interest if necessary
  3. Choose dimension in Orders View
  4. Choose measure in Orders View
  5. Use table calculation to compute the retention percentage based on first month’s performance. The code for the calculating order number retention could look like something like this:
${orders.number_of_orders}/index(${orders.number_of_orders},1)

Calculation for user number retention involves with acquiring the total number of users within the cohort. In this case, the checkbox needs to be selected in order to get the total distinct count of user number within each cohort. Then customize table calculation to compute the user number retention:

${users.number_of_users}/${users.number_of_users:total}

6. Choose Line Chart in Visualization, hide the raw Number of Orders columns from the visualization. Disable in the plot editor.

Some refinement of the Look:

  • To display cohorts within certain period of time, filter by the to the duration of interest
  • In the first figure, there is a dip at the end of each curve, and the cause is that the current month hasn’t finished and the data for the current month is incomplete. To rectify this misleading trend, filter by to be before the starting date of the current month to exclude orders in the current month (which hasn’t completed and whose order count will therefore look low) from the analysis
  • To investigate the shipped order retention, filter orders by
  • To remove smaller cohorts that have less than 100 users, filter by to be greater than 100

Extension

We have introduced the fundamental structure of building the Retention Analysis from the unprocessed tables. Complementary information such as user’s behavioral pattern and user purchasing journey summary can be joined to this Retention Explore by user_id. For example, adding user’s first order information will enable grouping users by their preference at the conversion. The more information added to the Retention Explore, the more ways you can slice and dice the retention cohort and the more insights you could unlock with this analysis.

The Retention Explore can be further customized according to the business needs. Here are a few things that we have considered and implemented in our Retention Explore.

Measure Picker

In our practical experience, it is relatively troublesome to switch from one measure to the other, because the switching involves editing the custom table calculation. This can be solved by defining a liquid parameter which serves as a measure picker. Below is a snippet of the LookML code to realize this function. For more reference, please check out Looker’s documentation about Liquid Parameters and this example.

parameter: measure_picker { 
type: string
allowed_value: { value: “Order Count Retention” }
allowed_value: { value: “Revenue Retention” }
}
measure: cohort_metrics {
type: number
hidden: yes
sql: CASE
WHEN {% parameter measure_picker %} = 'Order Count Retention'
THEN ${number_of_orders}
WHEN {% parameter measure_picker %} = 'Revenue Retention'
THEN ${revenue}
ELSE 0
END ;;
}

Cohort Selector

Sometimes, we want to make some comparison between cohorts that cannot be segregated naturally by week or month or quarter. For example, we want to compare the cohort converted during Thanksgiving vs. cohort converted during Christmas to New Years. This requires a cohort selector that can take any arbitrary date range as cohort definition. We have found some useful discussion here.

Average Revenue per User

This cohort based structure can be used to calculate other business metrics such as average revenue per user by dividing the cumulative revenue with total number of uses within the cohort. Both and measures are needed and the table calculation will do the trick:

running_total(${order.gross_revenue_total})/${member_profile.num_accounts:total}

Ro Data Team Blog

Ro Data Team Blog: data analytics, data engineering, data…

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store