Q#61: Average lifetime value of a customer

Given the following dataset, calculate the average lifetime value of a customer.

Below are a couple of definitions to help solve the problem:

  • Lifetime value is average revenue divided by the churn rate
  • Churn rate is defined as the total number of churned customers / total number of customers

TRY IT YOURSELF

https://colab.research.google.com/drive/1lgOzdNeVVs-KrkteKZdbfXFbkYc8M_u8?usp=sharing

ANSWER

This question tests our ability to wrangle data in Python using Pandas.

At first glance this seems easy given the steps provided, first we load in the data using the standard pandas method .read_csv(), then we can quickly calculate the churn rate by using the churn column and taking the ratio between the values that are ‘yes’ divided by the total number of observations. And now that we have churn rate, we can get the lifetime value and BAM! we face an error, this column is a string. Where did we go wrong?

The first step is correct let’s load the data with .read_csv(). However, next lets check the datatypes of all the columns with the .dtypes attribute and check if there are any NaNs with .isna().sum().

df = pd.read_csv('https://raw.githubusercontent.com/erood/interviewqs.com_code_snippets/master/Datasets/teleco_user_data.csv')df.dtypes
df.isna().sum()

This will reveal that the column TotalCharges is of type string and that there are no NaNs. However, if we look closer at the data we see that the TotalCharges column has entries that are an empty space string. We should get rid of these with the exclusion indexing criteria. Then we can use the .astype(float) method to change the column values to floats and proceed with our earlier steps just fine. We use the .shape[0] attribute to get the row/observations count and .mean() at the end to return the average lifetime value of a customer.

df =  df[df['TotalCharges'] != ' ']churn_rate = df[df['Churn'] == 'Yes'].shape[0]/df.shape[0]
df['Lifetime_Value'] = df[df['TotalCharges'] != ' '].TotalCharges.astype('float')/churn_rate
df['Lifetime_Value'].mean()

--

--