Quick Cohort Analysis

Mike Kulakov
3 min readMay 21, 2015

I guess many of you guys have heard of such a term as a “cohort analysis”.

The high-level problem we are talking here is that every day, the entire audience of your startup is a mix of users signed up yesterday, today and last week. And it is not correct to analyze their behavior in a bulk trying to discover certain patterns. That’s where the cohort analysis would be useful.

In general, the idea is to split user base into groups based on certain criteria and track their behavior over time. In my example, common group characteristic would be their “join date”. And the metric I’m going to analyze would be their “last activity date”.

Why did 60% of people signed up on February become inactive next day, while in January it was only 30%? Does the new onboarding flow released at the end of January make troubles?

As a different example, you can analyze engagement or activation based on different referrals.

I’m not going to dive deep into the theory here. I just want to share steps I did (as not a technical founder) in order to get some insights quickly. And these steps require zero development efforts of the team. That is good! ☺

Rough data

I wrote a fairly simple SQL script (you can ask anyone for assistance; still it takes no longer than 10 mins) where:

ba_users is an internal table where we daily aggregate certain metrics about our user base, like lifetime days, first-time entry, last time entry, total projects, total time, etc.

ba_numbers is a simple temporary table with just values from 1 to 20. This is the length of the period I want to monitor our cohorts (it could be shorter/longer)

I’d be surprised if you don’t have values similar to listed above in your database ☺

The result would be:

On week #9, 381 new users joined. Next day, only 156 users were still active, the second day 131 etc.

Now let’s make it more human-friendly.

Making a Cohort table

I’m copying my SQL results above into a Google Spreadsheet and use the Pivot function.

  • Select 3 columns
  • Click Data > Pivot table report
  • Rows = Column A, Columns = Column B, Values = Column C

As the result, I have a new page like this:

Now,

Column A is my group/cohort and represents week #.

Columns B-U represents the number of active users day after day (length was defined by ba_numbers table)

With this analytics in hands, I can see what’s my bounce rate, find and analyze anomalies, map it to our changes log and track progress over time.

Hope that helps.

--

--