These days almost all businesses, no matter the industry, collect lots of data. This data can describe an array of aspects, like for example clients’ behavior, your marketing spend, or overall company finances. If you want to make the business you own or work for thrive then there’s probably merit in crunching this data. This is no news, of course, data analysis has been a wildly popular topic for over a decade now. Yet, still, too often data is being processed in ways that don’t lead to insights that should support businesses’ decision making.
Do you know how to present your data in a fashion that clearly shows the impact of changes you’ve made to your product or service? Are you able to see if there’s a difference in the magnitude COVID has influenced revenues coming from long-time clients versus those acquired more recently? Here at Qonto, a very fast-growing online bank for businesses, the Business Intelligence team often choses to perform so called “cohort analyses”, in order to answer questions like these. Read on if you’re interested to find out why, and how it is done.
What exactly is a Cohort Analysis?
In the world of Business Intelligence, the term “cohort” typically refers to a group of users who register in the same month or week. In a cohort analysis, these groups’ behavior is compared over time. Product teams often use cohorts to track the evolution of feature usage.
Cohort analyses allow us to answer in an objective manner to questions such as “Were the users who registered in recent months quicker to adopt our special features than those who registered earlier this year, or not?”. So instead of looking at the whole user base as a single unit, Cohort Analyses break them down into groups based on a certain starting point, which allows for fair comparison and identifying patterns.
The insights one can get from cohort analyses are plenty and not at all limited to the above example. This method of data analysis can help any kind of team and inform any topic where “time” is an important element. Think of the evolution of your per-user revenues, client retention, conversion on a new website layout, or even your company’s HR talent-hiring process.
In this article, I will explain why at Qonto we find cohort analyses inevitable, discuss a variety of use cases, and show how to visualize the key insights coming from cohort analysis. As a bonus, I’ll provide some example SQL-code, so that you as well can easily get started with a better understanding your business’ data.
How is a Cohort Analysis done?
Let’s take a look at the example question: do more recently registered users show increased feature-adoption rates? To illustrate I’ll now introduce a dataset, in which users are already split according to their registration month.
What we see is that at present the share of users who’ve adopted our special features is highest among those who registered in January (56%). Also, the users from August have a lower adoption rate (34%) than those from March or April (42%). The pitfall here of course is that those who registered earlier this year had much longer to adopt than those from the August-cohort.
The older users maybe are also more likely to have been targeted with more marketing communications encouraging them to try out the special features, or have seen pop-ups on our platform inviting them to navigate there. So how do we compare fairly the adoption rates of our recent user cohorts to those who registered over half a year ago? The answer is in introducing two new metrics to the above dataset: “observation month” and the time difference in months between the registration month (the cohorts) and the observation month.
The table above goes on to specify all possible combinations between months of registration and observation, and how many users adopted at that point. We now have all data needed to do cohort analysis, yet to get a clearer oversight the table should be pivoted, bringing “Δ Months” to the x-axis, while “Cohort Month” stays on the y-axis. This gives a so called “Triangle Heatmap”.
By pivoting into such triangle all “user cohorts’ adoption rates reached after the same amount of months since registration” are placed in the same columns. So for our January (2020–01) cohort, we see in the 2-months-later column a rate of 30%, which is the share of users from this cohort who has adopted the new features by March. One row down, for our February cohort, 33% is the share of those users who’ve adopted by April, etc.
Now that the data is ordered and colored in this fashion one can easily compare cohorts’ performance at equal points in time relative to their starting moment. This reveals that the July and August cohorts are certainly not underperforming compared to those of January, February, March or April. In fact the users who have registered in more recent months are faster adopters of the special features.
Something else that catches the eye is the particularly low performance of the March cohort. The Triangle Heatmap by itself can’t provide an explanation for this cohort’s low rates, but it does the highlighting which motivates an investigation. Perhaps the users who registered in March didn’t receive the same introductory email pointing their attention to the special features? Or the marketing budget for that month was experimentally shifted to a different channel, attracting another type of user?
Below the triangle, to show users’ average progress, I’ve calculated the weighted average feature-adoption rates of all those who’ve joined our platform this year. Now all the above can be visualized pretty neatly with two graphs: imagine we’ve set goals for adoption rates at 1, 3, and 5 months after registration, yet we also want to be clear on what are the overall rates so far.
In the left graph we observe clear progress in the 1-month (after registering) feature-adoption rates, which for June and July cohorts is almost double of that of users who registered during the first quarter of the year. Also, the May cohort is showing a promising figure for the 3-month rate, yet it’s too early to tell if other cohorts will follow this trend. As only three cohorts from this dataset have an existence of 5 months (or more) this analysis can’t yet show if longer period adoption rates will grow above “about half” of the users or not. It might turn out that there’s a “ceiling rate”, which is just reached a lot faster by newer cohorts than by the older ones. With time the new data will tell.
Now that the basics of a cohort analysis are covered I will treat two interesting use cases in the next paragraph: Average Revenue per User (ARPU) and User Retention.
More use cases of Cohort Analysis
Average Revenue per User
For many businesses one of the most important metrics to be aware of is the average revenue per user, or in short their ARPU. This figure makes it possible to determine what is an acceptable cost of acquisition for your company, or to take a step towards calculating users’ lifetime value. At Qonto we closely follow the evolution of our ARPU with the help of cohort analyses, which has lately sketched the following picture.
Now another great attribute of the Triangle Heatmap appears here: when reading diagonally one is looking at what occurred during a specific period (or “observation month”) to all the cohorts. For example, in the cell on the intersection of cohort 2020–04 and 0 months later the triangle has a calculation of what happened in April this year, namely a €4 ARPU for the users who registered in that month. On the same diagonal there’s the intersection of cohort 2019–11 and 5 months later, which shows a €18 ARPU, that is also something that happened in April this year, but to older users. The last (and therewith longest) diagonal thus shows performance of the most recent month for every cohort.
This “diagonal aspect” of the triangle has the quality of drawing attention to how holidays, big events, platform downtime, etc. influence all the users, independent of their registration month cohort. Like many businesses, Qonto saw its revenues decrease during the COVID lockdown of spring 2020, which explains the yellow diagonal stretching all through March, April, and May in our ARPU Triangle Heatmap seen above. Yet, the triangle also shows that recovery from this period has been very strong, as the outer two diagonal lines are even greener than all those seen before.
So far we’ve looked at two analyses, feature-adoption and ARPU, that would typically apply to all users, and have registration-month-based cohorts. However, it is just as valid to zoom in on a specific subset of your users, and to put them in cohorts based on the moment they take a certain more advanced action on your platform.
Imagine that your company has a selective program for very active users who can get discounts if they continuously participate in a small survey on your platform. They start getting perks as of the first one that they fill out, yet for you, the ROI (on giving those discounts), in terms of information, becomes only positive once a user filled out for twenty or more consecutive weeks. Therefore, it’s important to keep track of those users who start participating, by placing them in cohorts based on their first week of filling out the survey (instead of by registration-month), and then plotting which share of them is still doing so Δ weeks later.
Now you’re in week 10 of the year and you analyze how the cohorts have been doing so far: are those active users loyally filling out your survey every week? It doesn’t look great (see image above). At this pace of participation decline only half of those who already started enjoying discounts will end up reaching your target of twenty consecutive weeks of providing survey answers to you.
Alarmed by the data you call a meeting with tech and marketing colleagues to figure out how this can be improved. The team decides to add a shortcut button on the homepage that brings eligible users directly to the survey.
Well, looks like that had the desired effect! Moreover, one of the marketeers installed a reminder-email end of week 14, which led to further improvement.
Now that the insightfulness of cohort analyses has been established, let’s take a look at how you yourself can get to do one quickly, in case your data is still resting in its database or warehouse. In the next chapter, I’ll show some SQL-code that can be adopted to serve you.
Structured Query Language example
For the proceeding example, I’ve simplified a piece of SQL (Snowflake) code that we use to monitor the speed at which our onboarding team manages to validate the KYB (Know Your Business) documents of clients who open an account at Qonto. Our goal is to have everyone processed in at most 10 days.
The above code outputs a table that specifies all possible combinations between signing dates (the cohorts in this case), days later, and then the number and share of clients’ KYB documents were validated at that point.
Now, most data visualization tools would allow you to pivot the table, i.e. bring delta_days to the x-axis, leave cohort_date on the y-axis, and have conversion_date data displayed, creating the now-familiar Triangle. Also, you could do this mutation in Excel or Google Sheets. Yet, if you wish to output the Triangle directly in an SQL-editor it is possible by adding one more step.
Voilà, easy, and triangular, as pie. 🍰
Join the team
I hope you’ve enjoyed the article and will be soon able to put cohort analyses to good use in your own work. Here at Qonto, we love to make data speak, by applying both simple and more complex methods, in order to get a profound understanding of our customers and their relationship to our ever-evolving product. Qonto and its data team, which consists of scientists, engineers, and BI-experts, is currently growing fast. So, if you are looking to join an ambitious team modernizing a dusty industry thanks to data, design, and technology then get in touch!