Product feature retention analysis — MCC coefficient
To make a reliable assessment of how product features impact retention we need to consider:
- popularity of product feature
- retention performance of product feature
- adjust for the performance of not using it
One of the possible solutions to this is to use Information Gain.
The main drawback of Information Gain is that it does not distinguish a direction of impact: positive or negative. That’s because Information Gain is always a positive number.
Let me show you an example with product feature18.
feature18 has 3rd highest Information gain (0.0140), but if we look carefully we will see:
- users who used feature18 have user retention = 7.9%
- users who didn’t use feature18 have user retention = 20.7%
The data tells us that using this product feature has a strong negative impact on retention.
From a Machine Learning perspective that’s completely OK, but from a product analytics perspective — it’s not. We need to know the direction.
To overcome this issue, I recommend using the MCC coefficient.
The MCC calculation is a correlation coefficient for two binary variables. There are several variants of how to calculate it, but I prefer to use this one:
Let’s calculate the MCC coefficient for the product feature list and visualize it.
Probably the best thing that we can do is to compare the MCC coefficient with the Information Gain in one chart.
There is one important insight here:
Almost all popular product features have a negative impact on retention (see below).
As a rule, the most popular product features are `setup` features.
These product features appear at the top of the funnel (where user intention is low) and because of this user retention is also low.
Bonus fact:
If we look carefully we can spot that the MCC coefficient is negative when the metric [% returned users prd] is lower than the weighted average.