Analytics Vidhya
Published in

Analytics Vidhya

Churn Report in Pandas

This is a continuation of my Sales Report in Pandas post and it begins with the final table from it. Here it is for one random customer.

report[report['CustomerID'] == 12347.0]

It represents an activity of this particular customer for the whole available date range aggregated by month. You can see that this activity is not regular.

First, I want to add a flag whether or not customer was active in the previous month.

report['active_prev'] = (report.sort_values(by=['month'], ascending=True)
.groupby(['CustomerID'])['active'].shift(1))

This hard stuff is something like a window function in sql. Because I have many customers I have to somehow say to pandas that they should be treated separately, so that’s what groupby for. And also I want months to be in order, that’s what sort_values for. Finally, shift gives the value from the previous row (month in this case). Check it.

report[report['CustomerID'] == 12347.0]

Correct. Compare the active column with active_prev. Now I want to understand when customer go from active to inactive and vice versa.

report['change_status'] = (report['active'] != report['active_prev']) * 1report[report['CustomerID'] == 12347.0]

Looks good. Next I want to assign some kind of an id to each status change. And here is a very neat trick with cumulative sum. Look at the change_status column. If I perform a cumulative sum on this column I will get an increment each time the user changes status. That’s what I want. And again I use this scary construction because I want pandas to treat each user separately.

It’s perfect. Find a session with id of 2. Can you see that it is the session where the customer was inactive for two months? Now I can sum up the number of inactive months per user and per each session.

report['inactive'] = (report['active'] == 0) * 1report['month_inactive'] = (report.sort_values(by=['month'], ascending=True)
.groupby(['CustomerID','session_id'])['inactive'].cumsum())
report[report['CustomerID'] == 12347.0]

Look at row number 4. In the month_inactive column we get the correct number of months the customer was inactive. Check it with your eyes. And it’s correct for the whole table).

Now let’s calculate the metric. Say’s number of users who has been in churn for two or more months.

report['churn_2m'] = (report['month_inactive'] >= 2) * 1report[report['CustomerID'] == 12347.0]

Looks good. Now groupby to get info for all customers per month.

t = report.groupby('month')[['revenue','user','new','active','churn_2m']].agg('sum')t

You also may want to get it in %. Choice of denominator depends on your methodology of churn. Here I will just use current user count for each month. But it might be user count for the previous month or even for month-2, because we are measuring two month churn.

t['churn_2m%'] = t['churn_2m'] / t['user']t

How cool is that?)

Here is the Colab notebook with the whole project (sales + churn) https://colab.research.google.com/drive/1TwivVdUbavTO2aIWBBDlFEUPvF74PwU-#scrollTo=2sC3zrxjRj8c

Thank you for reading. Find me also at https://www.glebmikhaylov.com/ and at my YouTube channel https://www.youtube.com/channel/UCLdAnxmoGVySnh691CwDz9Q .

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Gleb Mikhaylov

Gleb Mikhaylov

I share my experience in data science, computational thinking, Python, Wolfram. https://www.glebmikhaylov.com/