How Photomath uses Looker to calculate user retention — on tens of millions of active users!

With over 300 million downloads and 800 GB of daily data, here’s the exact process we used in Photomath to create a practical, live dashboard in Looker for monthly retention analysis.

Published in

Photomath Engineering

7 min readMar 17, 2022

Photomath is, currently, the #1 app in Education on the App Store. It has been App of the day numerous times. With over 300 million downloads, we can rightly say that it is the best math learning app in the world. And that accumulates a lot of data. A lot. Tens of millions of active users are “to blame” for approximately 800 million stored rows, or 800 GB of data. On a daily basis!

All this data needs to be captured, systematized, and finally analyzed. Why analyzed? Because data in itself is merely facts and figures. Data analysis provides context for the data, this context can then be used by decision-makers to take action with the aim of enhancing productivity and business gain. The problem of choosing the right tool for accurate and fast analysis of so much data is not simple and there is no straightforward solution.

Since we, at Photomath, mostly use Google technologies, Google Data Platform with BigQuery as the data warehouse, the choice for the BI tool has logically fallen on part of Google’s family — Looker.

Looker is a powerful software and big data analytics platform that helps you explore, analyze and share real-time business analytics easily. Part of the real-time business analytics is, given our volumes and frequency of data, played a key role in acquiring Looker.

User retention is one of the basic and most commonly used metrics in these kinds of applications. High user retention numbers indicate that you have built a strong relationship with your users and that you care deeply about their experience with the app, not just about how to acquire them as much as possible. Also, it increases your users’ lifetime value and boosts your revenue. How many of our new users come back to the product is a question most product teams ask and retention analysis provides the answer.

To recall, retention analysis looks at first-time users within a specific time frame (typically one month or one week) and calculates the percentage of those users that return in subsequent periods. User retention can measure user logins globally, logins for a subset of users, or retention for specific behaviours like visiting a part of an application, using a specific feature, or completing a key workflow.

Let’s say that the records of all events triggered by users in the application are recorded in the tbl_activity table. To begin with, from that table we need to make a derived table with user records and the dates of their first use of the application. For this, we can write a simple SQL query where for each user we look for its minimum date in the activity table.

SELECT user_id, MIN(activity_date) AS first_contributionFROM `tbl_activity`GROUP BY user_id

However, Looker offers a simpler and clearer option — a native derived table. Instead of writing a query, in the Explore of activity table, click on the desired dimensions and measures, then click on the wheel at the top right, Get LookML — Derived Table. We copy the content and paste it into the new derived view. In my case, it looks like this.

view: first_contributions { derived_table: {datagroup_trigger: events_datagroup  explore_source: tbl_activity {   column: user_id {}   column: first_contribution_date {} }}dimension: user_id { description: "Key of a particular user." type: number }dimension: first_contribution_date { description: "The earliest activity date (per each user)." type: number }}

where first_contribution_date is a measure in tbl_activity:

measure: first_contribution_date { description: "The earliest activity date (per each user)." type: date datatype: date sql: MIN( ${activity_dt_date}) ;;}

Note: in the measure above a more elegant solution would be to put type: min, but then in the Explore tab, it is showing the null values since Looker does not allow the combination of type: min and date in LookML. The problem is addressed here.

Also, when Looker generates this derived table with the corresponding dimensions, we can notice that the first_contribution_date dimension is initially set to type: number. It is convenient to change this manually to type: date.

The next step is to create a new view — user_retention.view which will, from the activity table, in combination with the first_contribution.view, extract the user, his first contribution month, and for each subsequent month count the number of his events in that month. Such a table is the basis for calculating the required retention table.

view: user_retention { derived_table: {datagroup_trigger: events_datagroup  sql:SELECT users.user_id AS user_id, DATE_TRUNC(users.first_contribution_date, month) AS   first_contribution_month, month_list.event_month AS event_month, COALESCE(data.monthly_events, 0) AS monthly_events, ROW_NUMBER() OVER() AS keyFROM ${first_contributions.SQL_TABLE_NAME} usersLEFT JOIN ( SELECT  DISTINCT DATE_TRUNC(activity_dt, month) AS event_month FROM `tbl_activity`) AS month_listON 1=1LEFT JOIN ( SELECT  o.user_id AS user_id,  DATE_TRUNC(o.activity_dt, month) AS event_month,  COUNT(DISTINCT o.user_id) AS monthly_events FROM `tbl_activity` o GROUP BY  1,  2 ) AS dataON data.event_month = month_list.event_month AND data.user_id = users.user_idWHERE month_list.event_month >= DATE_TRUNC(users.first_contribution_date, month)   ;;

Obviously, we didn’t have to specifically do the first_contributions derived table. We could have simply inserted it into the SQL query with WITH clause as a temporary table. However, first_contributions is a table that will surely be used for some future requirements and it is good to have it separate for clarity, memory saving, and running speed (the activity table is huge).

Speaking of speed, in this case, it is convenient to make a permanent derived table with the same datagroup trigger (events_datagroup). The first_contributions PDT is created from the activity table in our database. And then we’ve created another PDT, the user_retention PDT, which is a combination of the first_contributions PDT and activity table. The user_retention PDT and the first_contributions PDT are cascading PDTs, where user_retention is dependent on first_contributions (since user_retention is defined using the first_contributions PDT). Here you can find more.

It is useful to point out another problem I came across. Usually, in the SQL query of derived tables, we select data directly from the tables in the database, such as tbl_activity. However, here we have to do it also from another derived table, or view, and that is the first_contributions.view that was just created. So, ${first_contributions.SQL_TABLE_NAME} references another view or derived table. Learn more here.

Further, let’s define dimensions and measures in user_retention.view .

dimension: user_id { hidden: yes type: number sql: ${TABLE}.user_id ;;}dimension_group: first_contribution_date { type: time timeframes: [month] sql: ${TABLE}.first_contribution_month ;;}dimension_group: event_date { hidden: yes type: time timeframes: [month] sql: ${TABLE}.event_month ;;}dimension: monthly_events { hidden: yes type: number sql: ${TABLE}.monthly_events ;;}dimension: key { hidden: yes type: number primary_key: yes sql: ${TABLE}.key ;;}dimension: months_since_first_contribution { type: number sql: date_diff(cast(${TABLE}.event_month as date),cast(${TABLE}.first_contribution_month as date), month) ;;}measure: total_users { type: count_distinct sql: ${user_id} ;;}measure: total_active_users { type: count_distinct sql: ${user_id} ;; filters: {  field: monthly_events  value: ">0" }}measure: percent_of_cohort_active { type: number value_format_name: percent_1 sql: 1.0 * ${total_active_users} / nullif(${total_users},0) ;;}

We also associate these two views with tbl_activity explore via user_id, as follows.

explore: tbl_activity { from: tbl_activity join: first_contributions {  from: first_contributions  relationship: many_to_one  sql_on: ${tbl_activity.user_id} = ${first_contributions.user_id} ;; } join: user_retention {  from: user_retention  relationship: one_to_one  sql_on: ${first_contributions.user_id} = ${user_retention.user_id} ;; }}

The LookML part is over! In Explore — Tbl Activity we select User Retention. As for dimensions we take Months since First Contribution and First Contribution Month as Pivot dimension. The measure is Percent of Cohort Active.

The monthly user retention table is ready. We can use visualizations, for example in the form of a table with cell visualizations or a line chart (don’t forget to turn off the Plot Null values feature). Additionally, we could save this line chart to a new dashboard on the wheel at the top right — Save to dashboard.

From various apps to retail stores, many companies want to understand how well they are retaining various user groups and how to effectively leverage that understanding. This means that developers in those companies have also felt the pain of having to write some very complex queries.

It’s difficult to analyze retention when we use traditional SQL because we want to simultaneously see the number of users in a cohort (time-invariant) and the number of users in that cohort with activity in a given month (time-variant). This is where Looker comes into play, offering the end-to-end process of tracking users and their retention, from the table in the database to the final dashboard, as described above. Once such a dashboard is set up, we can access live data in a visually pleasing way at any time and make some call to action towards our users. This is just one of the reasons why Looker is considered one of the most powerful data platforms today.

This great topic, written on Looker’s community website served as the inspiration for this article. Also, check out other topics on retention analysis on the same community website and this fantastic video from Looker’s annual Digital Data Conference.

Like what you’ve read? Learn more about #LifeAtPhotomath and check out our job postings: https://careers.photomath.com/

How Photomath uses Looker to calculate user retention — on tens of millions of active users!

With over 300 million downloads and 800 GB of daily data, here’s the exact process we used in Photomath to create a practical, live dashboard in Looker for monthly retention analysis.

Written by Paolo Rakocija