Who Is the Next One Leaving Your Website?

Analysing streaming service usage logs with descriptive statistics

Viovioviovioviovio
The Startup
6 min readAug 31, 2020

--

Introduction

This article is part of a project for Udacity “Become a Data Scientist Nano Degree”. The Jupyter Notebook with the code for this project can be downloaded from GitHub.

I will create a series of articles about this project going through CRISP-DM process. This part is covering the data and business understanding steps.

Business Understanding

Let’s imagine for a moment that we are freshly hired data scientists working for a startup called “Sparkify”, which offers music streaming service through their website and App.

Our first job is to prepare a presentation for the management meeting on business strategy. The meeting is going to be in several hours from now. We have about 10 minutes for our presentation there.

Clearly we want to impress our managers with our machine learning skills, but there is simply no time to clean all the data, not to mention run machine learning on the huge 12 GB log of the last two months of user activities.

We decide to take about 1% of users from the log and prepare some statistical analysis and visualisations to answer the questions we expect our managers to be most interested in, such as:

  1. Usage patterns
  2. Business development
  3. Threats to the business

1. Usage patterns

As a streaming service of course we would like to know how many songs are played every day:

We can see that there are only about half as much songs being played around weekends and unsurprisingly there is a large spike around Halloween. To get a better feeling of the usage frequency let’s look at the and average number of unique users per weekday:

Another interesting question is the distribution of user activity throughout the day. Let’s have a look at the average number of songs played by the hour:

And the user activity:

Summary usage statistics

Let’s formulate the key insights from our analysis:

  • We have seen that usage statistics follow a weekly pattern with less users using Sparkify on weekends.
  • Unsurprisingly there is a spike in streams around Halloween.
  • Throughout the day the number of users remains almost constant with a slight increase between 1 and 7 p.m.
  • The number of songs played per user throughout the day has a pattern where it follows daily activities: get up, way to work, start of work, lunch break etc.

More important is to know what we can do with this insights:

  • We can optimise licence costs knowing how many songs will be played.
  • We can optimise the number of servers running throughout the day and week to save electricity and networking costs based on user activity.
  • We can target our user communication to the time frames where they are most likely to use our service.

2. Business development

The main revenue source for Sparkify are periodical subscription fees from paying users. We would like to know how many users have actually used “paid” and how many used “free” options:

Another source of revenue is playing advertising clips for free users. How many clips are played every week?

Let’s also see how many ads on average are displayed to each user:

Summary business development

Let’s formulate the key insights and takeaways for our business.

Key insights

  • The number of paying customers is increasing in the observation period.
  • The number of adverts decreases.
  • The number of free customers is decreasing.

Takeaways for business

  • The number of paying customers is not changing much after the first week. Probably we need to motivate people to switch to paid account by limited time offer or free trial.
  • The number of free customers is decreasing at quite high rate. It seems that the free account is not very attractive. We have to look at the reasons more closely. Are the adverts to frequent? Do free users have limited access to the music titles?
  • Although the number of adverts is falling the number of adverts per user is increasing. Perhaps we have taken the wrong road here given that free users are probably choosing to leave the service over upgrading their account?

3. Threats to the business

Finally let’s look at the account level upgrades, downgrades and cancellations:

To have a more clear picture let’s see which account level do users who cancel their account have:

Summary business threats

Let’s formulate the key insights and takeaways for our business.

Key insights

  • The number of upgrades spiked in the first week of observation.
  • The number of upgrades is declining during the period of observation.
  • The number of downgrades has a small spike in the week 41 and is almost steady with decline near the end.
  • The number of cancellations is almost steady with a small spike around week 42 and decline near the end.
  • Paying users are cancelling their accounts more often then free users.

Takeaways for business

  • Whatever we have done in the week 40 we must keep doing that!
  • We need to understand why less and less customers choose to upgrade their accounts.
  • Although the downgrade and cancellation rates are falling we need pay more attention to them.
  • The fact that paying users are choosing to cancel their account rather than to downgrade them is alarming. What have we done wrong to make them angry?

Conclusion: can we identify reasons for churn?

The presentation went well. Most of the people in the room were not of technical background. They were impressed by comprehensive visualisations and clearly formulated statements about the current situation.

The consequence is that the management is now worried about churn. They ask us to find the reasons why the customers, especially paying ones are cancelling their accounts.

We will have to run machine learning on our data and it will take some days to find the right techniques on the small subset of data and then maybe some weeks to run the algorithms on the full dataset.

Using our intuition we can try to find a quick fix, which may help our company on a short notice. Let’s look at the statistics of rolling adverts:

It turns out paying customers still may see or hear an advert. Can it be the reason why they choose to quit? Perhaps our web developers should look into that issue.

In my next article I will focus on machine learning techniques and how can they be applied to predict churn based on usage statistics.

--

--