My First Technical Assignment Pt 2

Kerry Benjamin
The Data Logs
Published in
6 min readJul 3, 2017

Wrapping up the business questions.

#WOCinTech Chat

Hiya everyone! Glad you’re back. Today we’re finishing up the take home assignment Bright asked of me. We finished off their user acquisition side of things so next we move on to retention.

User Retention

  1. How many users do we keep week to week?

I liked these questions. I decided there were 2 ways to answer this so I did both. First I wondered how many people are retained during each week that they sign up. In other words, how many people leave during the same week they sign up?

week1unsub <- filter(Unsubscribers, Date_Left >= "2017-04-01" & Date_Left <= "2017-04-07")
dim(week1unsub)
week2unsub <- filter(Unsubscribers, Date_Left >= "2017-04-08" & Date_Left <= "2017-04-14")
dim(week2unsub)
week3unsub <- filter(Unsubscribers, Date_Left >= "2017-04-15" & Date_Left <= "2017-04-21")
dim(week3unsub)
week4unsub <- filter(Unsubscribers, Date_Left >= "2017-04-22" & Date_Left <= "2017-04-27")
dim(week4unsub)
901 6 #week 1
1129 6 #week 2
1890 6 #week 3
791 6 #week 4

This is why we created the Unsubscribers data frame in part 1. Like with the Bright_Users data frame, all we have to do is filter our the specific dates for each week and get a count of the rows. In this case, we’re filtering the Date_Left column. Week 1–4 had 901, 1129, 1890, and 791 people leave respectively. With a little bit of algebra week can work out the percentage:

wk1 - week1unsub = 6463  6463 / 7364 * 100 = 87.7 -> 88%
wk2 - week2unsub = 6023 6023 / 6874 * 100 = 87.6 -> 88%
wk3 - week3unsub = 9394 9394 / 9805 * 100 = 95.8 -> 96%
wk4 - week4unsub = 5778 5778 / 5956 * 100 = 97 -> 97%

As you can see by the numbers, most people who sign up during a week stay during that week. As for the second analysis, I wondered how many of users who signed up eventually unsubscribe?

week1unsubo <- filter(Unsubscribers, Date >= "2017-04-01" & Date <= "2017-04-7")
dim(week1unsubo)
week2unsubo <- filter(Unsubscribers, Date >= "2017-04-08" & Date <= "2017-04-14")
dim(week2unsubo)
week3unsubo <- filter(Unsubscribers, Date >= "2017-04-15" & Date <= "2017-04-21")
dim(week3unsubo)
week4unsubo <- filter(Unsubscribers, Date >= "2017-04-22" & Date <= "2017-04-27")
dim(week4unsubo)
1812 6 75
1928 6
2436 6
730 6

Breakdown by percentage.

wk1 - week1unsubo = 5552  75.4%
wk2 - week2unsubo = 4946 72%
wk3 - week3unsubo = 7369 75%
wk4 - week4unsubo = 5226 88%

Now this tells a slightly different story. While large droves of people aren’t leaving the same week the sign up, more people leave eventually. Bright keeps at least 70% of their users in each segment(signup week). Still pretty decent. As for the entirety of April, Bright retains 77% of their sign ups. So out of the 29,999 people who signed up, By May 1st they kept 23,093.

2. If a user unsubscribes, how long does it take them to do so?

The best way to answer this question is to find the median number of days. In this case, 50% of the users leave within 5 days of signing up. However, we can garner much more information by visualizing this and using a bit of statistics.

unsub_num <- as.numeric(Unsubscribers$days_unsub)
ggplot(Unsubscribers, aes( x = unsub_num)) + geom_histogram(fill = "dark blue")
median(Unsubscribers$days_unsub)
summary(unsub_num)
table(unsub_num)
IQR1 <- 18
Outlier1 <- 0-1.5*IQR1
Outlier2 <- 18+1.5*IQR1
Outlier1
Outlier2
Range of days that people unsubscribe

Uh oh. This certainly doesn’t look good. This histogram skews heavily to the right. In this case it means that most people who unsubscribe from Bright unsubscribe really, really early. While 50% of people leave within the first 5 days, looking at this chart most people seem to leave during day one. Let’s get more into the nitty gritty.

Time difference of 5 days #The median

Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 0.00 5.00 10.93 18.00 59.00

-27 #Outlier1 Numbers lower than this stand out
45 #Outlier2 Numbers higher than this stand out

Here is a summary of this data. The average amount of days to unsubscribe is about 10. As seen before, the earliest they leave is on the same day represented by the Min. The latest someone has left is 59 days. Using a bit of stats we can identify numbers that let us know that their are numbers that are odd and stick out a bit. In this case, any day someone unsubscribers after 45 days. There is a small segment towards the tail end of the histogram that show this group.

3. How should Bright act on this data?

So on the overall scheme of things Bright keeps a good number of people who sign up. However, for the people who do unsubscribe, they unsubscribe really early. The majority on the same day they sign up. Why is that? There should be some form of feedback that can be used to figure this out. They could have simply believed that Bright isn’t for them but we can’t say for sure without data.

Then we have the outliers. These are people who unsubscribe after a whole month. The latest someone unsubscribes is almost 2. That’s more than enough time to understand Bright’s product. So, what is it that made them leave? That’s important data to understand for long-term subscribers.

Bright’s image is also that of cheerfulness and positivity. So throughout this process they should remain on brand and find ways to offboard that keeps that in mind. “While you’re sad to seem them go, you’re happy their they tried you out and you wish them the best.” Something along those lines.

Other questions

This was the free style portion of the assignment. Basically they wanted to see if there any questions that I can come up with that would be interesting to analyze and where we can possibly find this type of data. Here’s what I thought of:

1a. How are signups possibly driven by our news cycle?

1b Is negative news related to the White House causing large amounts of users to sign up?

2a When are you users most active?

2b When do they respond back to you?

3a What industries do Bright users work in?

3b. Would be helpful to cater certain messages to those in high stress fields?

I’d need survey data, user data, and to observe possible hourly spikes when breaking news is announced.

Well there you have it folks. One complete analysis served right up. Though it didn’t get me the job, it did get me more experience so I’m thankful for it. If you have any questions or anything that isn’t clear let me know so I can edit and explain appropriately. If you’re also a data scientist or analyst, was there something else I could have thought about or analyzed? Could I have used machine learning or time series analysis in some way? I’m here to learn. Speaking of learning, going to have to hit the books for activity 4 & 5 of the data science learning club. Machine Learning will be the next project on the list so I need to get ready.

I also hope that you all noticed that none of this analysis needed fancy calculus rocket science type math to get done. Don’t let numbers scare you, you can do this as well if you’re interested. And you’re more than capable of learning them fancy mathematics. You’re ARE a math person. ^__^

If you enjoyed this analysis and learned something new hit the recommend button.

--

--

Kerry Benjamin
The Data Logs

I'm a Connector, Opportunity Seeker, Learning Data Science and Supporter of STEAM education.