What is Cohort Analysis and How Should I Use it?

A beginner’s guide to cohort analysis in Google Analytics

Bill Su
Analytics for Humans
11 min readJul 11, 2018

--

Last week, I asked our Analytics for Humans community on Facebook for a few suggestions on topics they’d like to learn more about.

Based on the responses, people were very interested in hearing about cohort analysis, so let’s go ahead and talk about it!

As one of the newest features in Google Analytics, cohort analysis is also the one that creates the most confusion in analysts.

The confusion primarily come from two places.

First, unlike most of the analytics features in Google Analytics (such as session analysis, page analysis, etc.), cohort analysis is dynamic.

It means that instead of summing your session or page activities over a fixed time range, cohort analysis describes behaviors of various groups of users OVER TIME on your website — which make the analysis somewhat harder to interpret and understand without deep understanding.

Secondly, cohort analysis is hard to use to produce actionable insight.

It might be good to know that my users are coming back to my website more over the past couple of weeks, but this information by itself is meaningless — how can I combine what I learned in cohort analysis with other analyses in Google Analytics to figure out what I can DO to make my user come back to my website more?

This article will help you resolve both confusions by

  1. Offering a comprehensive look at the cohort analysis technique, and few key points you need to grasp in order to understand how it is actually conducted
  2. Offer you a few key ways you can use cohort analysis, combining with other analytics modules in Google Analytics, to figure out how to make your users come back to your website more.

Then, we will continue the topic discussed in this article with another installment of our “Google Analytics API for Absolute Beginners” so we can reap benefits from the added functionality provided by the Google Analytics API, but that would be the topic of our next article.

What is cohort analysis?

Let’s begin by talking about what cohort analysis actually IS.

As hinted a little bit in the introduction, cohort analysis is

an analytical techniques that focuses on analyzing the behavior of a group of users/customers over time, thereby uncovering insights about the experiences of those customers, and what companies can do to better those experiences.

Well, throwing that huge chunk of definition in your face is probably like throwing a brick — it hurts, but you have no idea what it means, so let’s now break that definition down.

The key for understanding cohort analysis is to first step OUT of the technique itself, and consider yourself in the shoes of your customers.

Easy example of a cohort — fish in a fish tank | Picture by Frederica Diamanta on Unsplash

Let’s say that your customer Bob entered your online store four months ago in response to a 50% discount, looked through your wares, and bought a tiny trial set of your avocado cosmetics.

As a business owner, it is natural to ask yourself — are people like Bob coming back to my store as result of that trial set purchase? How fast is he going to come back and what value is he going to present to my company?

So you ask your store attendant (their name is “Cookie”) to keep track of behavior of people like Bob, and see if they return and make a purchase, and how frequent those returns are.

Now, four months later, you and “Cookie” sit down in your office desk and look through all Bob-like users that visited your store during the time of that specific promotion and bought a trial set. (by the way, this group of users that came in in that specific time frame is called a cohort).

You realized that 70% of the people who bought the trial set never came back (bummer!), while 20% came back to your store at least once but didn’t buy anything, and the remaining 10% bought something in this four month period.

So what happened, and how can we fix this?

First, you hypothesize that most of the users who never came back did not do so because they weren’t interested in your product, but rather because they just forgot about you in the vast sea of information (damn goldfish memory).

Given that, you probably should start running retargeting ads right at the end of the usage of their trial product to remind them to buy more of your product if they are satisfied.

Yeah turns out that this is super expensive — maybe give it to your customers for free? Picture by Bench Accounting on Unsplash

Secondly, you looked into the experiences of those who have visited your website but didn’t purchase, and realized that most of them stopped at the “shipping and handling” page of your product, meaning that they are hesitant about your shipping cost.

Maybe you should launch a free-shipping campaign for those returning customers so it is no longer a concern for them anymore.

Now, with those analyses above, you came up with two concrete actions that you can do to improve the conversion rate of those group, and most importantly, for all groups in the future in case you are repeating similar promotional events — now it is time to go do them, and see if your retention rate and conversion rate for the future improve!

That, ladies and gentlemen, is cohort analysis in plain words.

Three key anchors to understand cohort analysis

Now let’s talk about doing cohort analysis in practice.

To conduct cohort analysis, you will need three pieces of information to serve as “anchors” for your analysis.

The first “anchor” is the definition of the cohort, which is always a time period in the past.

Just as you would expect from its name, a “cohort” is always time-bonded, defining a group of people who entered your website/store at a certain time period. Therefore, you need to first decide which time period you want to analyze.

Cohort analysis is about groups you’re getting that from this imagery right? | Photo by Quino Al on Unsplash

The way I like to imagine cohort analysis is like a 400m race with a beginning and end, during which we want to keep track of the behavior of all of our racers (our customers/users).

The definition of cohort defines the moment in which we are going to start the race, and we will start tracking the behavior of each of the racers

If you are analyzing the behavior of your customers from a certain promotional event, your cohort could be all customers coming onto your website within the time range of that event.

You can add additional filtering to the cohort definition by only analyzing the customers who have visited your website via a specific source (such as Facebook, Google, etc.), however, the time-bond definition always needs to be there for the cohort analysis to work.

With the cohort defined, now it is time to decide how long you want to run your analysis for (the lagging period), which is the second “anchor”.

If you want to see how users behave one month after their initial visit, your lagging period would be 1 month.

This number is selected largely based on your preferences and your company’s industrial environment.

Finally, with both the lagging period and cohort determined, we can figure out our final anchor, which is the termination time of your analysis.

For example, if you are tracking the behavior of your cohort that visited your website from March 1st — March 7th, with a one month lagging period, the earliest you can get this result is by April 7th, which signals the end of the lagging period of the last possible person in that cohort.

This detail is easy to omit in Google Analytics since it treats an unfinished cohort as finished.

What do I mean?

This illustration displays a sample account with cohort size being on a weekly basis. The first column represents average duration spent in the first week from the time the cohort first visited the website. Meaning that if the cohort is May 27th — June 2nd, it calculates the average session duration of this same group from June 3rd — June 9th.

Now zoom onto the last two records of the week 1 and week 2 column, which gives the lowest average of :04 and :01.

Before you panic, think about it: at the time of this article’s writing, we are at the beginning of week of July 8th — July 14th, which means that the most recent 1 week and 2 week retention are only collected for the day of July 8th and 9th, 2/7 days — so this data is not accurate and should not be displayed in the first place!

Therefore, always pay attention to the final anchor of your cohort analysis to make sure that you are at the time to be absolutely certain that all data related to your cohort are collected, which in this case is July 15th.

Cohort Analysis in Practice

Since this article is dedicated to talking about all details of cohort analysis, let’s also touch on how you can do cohort analysis on your data without assistance of tools such as Google Analytics.

And more importantly, I will show you why Google Analytics is giving you inaccurate data in their cohort analysis feature.

A common misconception I see people have about cohort analysis is that they think cohort analysis is merely aggregation of data points for the beginning period (March 1st — March 7th) and the ending period (April 1st — April 7th).

The fact is, if you think more about it, it is in fact impossible to compute your cohort performance with aggregate data, because there is no way to tell whether the session visit in March is from the same person who visited in April.

In fact, to compute cohort analysis, you need to identify all users who visited during the initial period (March 1st — 7th), and look at the behavior of each of these users one by one to see if they visited by the end of their lagging period, which could fall between April 1st — April 7th.

Then, you aggregate all of those individual users’ experience together to produce the final cohort analysis data.

During your computation, there is a catch: if a user visited in March 2nd and returned on April 3rd, they should not technically be considered as “returned” due to the one month limit on users’ return.

This means, to ensure maximum accuracy on your data, the termination time of your analysis will range from April 1st — April 7th, depending on when the users first visited your website, instead of one single termination date of April 7th as used by Google Analytics.

Having a single termination date will make the data slightly inconsistent — as your users technically have a lagging period ranging from 1 month — 1 month 7 days. This will result in a slight inflation of your end result.

However, the reason Google does this is perfectly sound — computing some slightly less accurate data is a lot more computationally efficient, given the volume of computation required by Google Analytics to provide cohort analysis for all users.

As users, you just need to aware of the fact that all your cohort metrics are inflated a little compared with the most accurate computation when using Google Analytics.

How to use cohort analysis in your business

Now let’s wrap this part up with a brief introduction of types of cohort analysis you can do with your data, and what type of business insights you can draw from each.

Overall, it is hard to extract business value from one-time cohort analysis immediately compared to other popular analytics methods.

This is mostly because you cannot really react based on your cohort analysis data.

If you are running, let’s say a funnel analysis, and see some users dropping out of a part of the funnel very rapidly. You can immediately launch a retargeting campaign to those users, while patching the funnel up with some site redesign to make sure you don’t lose those users.

However, when you are doing cohort analysis, the feedback cycle is usually way longer.

For example, if you are running a 1-month lagging cohort analysis and implemented some improvements on your 1-month user experience, you really cannot fully comprehend the result of those improvements until 1 month later when you can see the full journey of your current cohort.

And then, further changes will take one month further to see effect — which is really slow from a digital marketer’s perspective.

However, while slow, cohort analysis provides a much more complete look at your user journey, and it is incredibly helpful to help you design a campaign that not only sees result immediately, but also is sticky and creates long-term value for your company.

For me, cohort analysis can be used for two primary purposes: for one-time campaign retrospection, and for ongoing user engagement benchmarking.

Almost all great marketers and campaign managers run retrospective analyses on all of the campaigns they can for their clients or their company.

In the retrospective, the manager look at all data related to the campaign 3–4 month after its passing, and perform a final assessment on whether the campaign was effective in bringing intended results for their clients in the longer term, and in the short time.

Here, cohort analysis can provide a great amount of information for the company and managers as to the behavior of the customers that was acquired or/and converted throughout the campaign.

For example, if you are running a promotional campaign that sells a trial set of your product for a below margin discount, what you really want is for those people who bought your product to come back and make a “real” purchase.

Cohort analysis can help you figure out whether that’s the actual case, and provide you with a final sanity check of whether you should continue the same campaign next year or next quarter.

Now let’s talk about the second analytics use case, which is ongoing user engagement benchmarking.

For those of you who read my stuff a lot, you know that I always hesitate over changes in key metrics such as sessions and bounce rate in a short term, since they are usually very volatile.

However, for cohort analysis, since it is computed over a large period of time, it is less prone to those short-term influences, and are likely reflect your true customer engagement level in a long-term perspective.

But, that’s enough information for you to truly master and understand cohort analysis. Next part of this series, we are going to talk about how to conduct cohort analysis in the Google Analytics user interface, and then move onto doing more analyses using the Google Analytics API.

Stay Tuned!

--

--

Bill Su
Analytics for Humans

CEO, Humanlytics. Bringing data analytics to everyone.