Data Analytics & Modeling

This class is basically the new “intro to stats” class. However, I believe the new name is much more appropriate since it provided us the foundations of probability and statistics, focusing on data and how to make informed data-driven decisions under ambiguity.

This class exposed us to Tableau — an amazing tool for creating static reports, live dashboards and “stories” where you can, well, tell a story from a data perspective. It easily connects to all kind of DBs (including importing Excel and csv formats) and enables you to export and share your findings in an easy matter. It takes between 30 minutes to 2 hours to really get started, but they have a decent amount of learning videos that will help you getting started and keep going.

Tableau is a super powerful tool for product managers in terms of exploring the data, and more importantly visualize it and tell a compelling “data story”. In data-driven organizations, a product manager usually wins arguments with data, and this product is a great one to have in your product management toolbox.

Another important concept we learned (and basically almost every student learns for some extent) is Hypothesis Testing. Some quick and oversimplified reminder — you have your null hypothesis (H0) and you want to reject it with some statistical confidence level. Well, that’s exactly the day to day of a product manager when for example he wants to improve his product by adding or modifying a specific feature or to run a simple A/B test. The problem is that most of the times this is done with either vanity metrics or just by a rule of thumb.

Consider the next situation: 150 of the 1000 users that downloaded your app completed successfully your on boarding process. In order to improve it, you changed some screens, removed some buttons, and even allowed a sign up process without forcing the user to give you his email address (amazing!). Now, after 100 new users downloaded your app, you want to check if the steps you took actually improved the on boarding process.

Would you be happy with 5% improvement? I know I would, however, is it statistically significant? Did it happened due to your modifications or maybe just by some random chance? It can be easily answered by a one-tail, two-proportions comparison hypothesis test (the fun part begins):

P1 is the old proportion, n1=1000, Y1=150, so our estimate for P1 is 0.15

P2 is the new proportion, n2=100, Y2=20, so our estimate for P2 is 0.2

Step 1 — formulate your hypothesis

H0: P1 — P2 >= 0

H1: P1— P2 < 0

Our null hypothesis states that the old proportion is greater or equal to the new one — of course we want to reject it (which means that our new proportion is bigger than the old proportion). Since we are looking specifically for an improvement, we are using greater or equal in our null hypothesis (which implies a one tail test).

Step 2 — calculate your Z-statistic

Without getting into too much details, using the formulas above and the numbers we have, we will get our P-hat: (150+20) / (1000+100) = 0.1545) and than our Z-Statistic will be equal to -1.319

Step 3 — calculate your critical values

This step depends on two things — the confidence level you chose and whether it is a 2-tail or a 1-tail test. In our case this is a 1-tail test and we will use a confidence level of 0.9(90%). To calculate the critical values use Excel’s NORM.S.INV function with 0.9 and 0.1 (1–0.9) and you will get [-1.2816, 1.2816].

Step 4 — decide to reject or not

Finally, reject if the Z-statistic is lower than the lower critical value or higher than the higher critical value. In our case -1.319 < -1.2816 so we reject the null — meaning there is a significant statistic difference and we improved our on boarding process!!!

Any caveats? Well, you can always claim for a 2-tail test, different confidence level or just argue that the difference is due to time or other factors we haven’t considered. You will (almost) never be 100% sure, however using methodical statistic method is better than a plain rule of thumb.

You can use this article as a basis to implement this concept. All you need to do is create a simple Excel template with the right formulas and I suggest talking with your data science team before running an experiment like this, to make sure you use the right hypothesis test and parameters.

As always — let me know what you think — ik325@cornell.edu