What Sherlock Holmes Taught Me About Data Driven Decisions

Sherlock Holmes has been my all time favourite. Over this long weekend, I had the opportunity to get back to reading some of his adventures that held my fascination during high school days.

While the stories were as gripping and fascinating as they were when I first read them, I could not help but notice how deceptively similar his method of deduction is to my daily job as a product manager.

Hulton Archive/Getty Images

One phrase in his famous story “A Scandal In Bohemia”, in particular, caught my attention:

“It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts.”
- Arthur Conan Doyle, Sherlock Holmes

How true! It’s not uncommon, when you have a pressing business decision to make and your app is not instrumented to capture the relevant pieces of data to make that decision.

What do you do, in this scenario? It’s very tempting to scrape the barrel for the “most relevant data that is available”, build a theory backed by “common sense” to come to a conclusion and then fill in the gaps with “extrapolation or interpolation” to justify the theory. This can be a fatal mistake as is elucidated very well by Sherlock Holmes. After all, decisions are only as good as the quality or correctness of the data on which they are based.

Let’s take the case of a simple content website that makes money through ads. The owner of this website wants to increase his revenue per user and decides to look at his analytics data to decide the next set of features in a “data driven manner”.
Here are the metrics that are available at his disposal:
- Visits
- Time On Site Per Visit
- Page Views Per Visit
- No. Of Ads Clicked Per Visit
- Revenue Per Visit
Looking at these metrics, the owner sees a positive correlation between “Time On Site” per “Visit” and the “Revenue”. And with this correlation, he theorizes that making a user spend more time in a particular visit should increase the revenue earned from the user.
The theory being established, he decides to heavily cross-link his site to bounce people across pages in his site as much as possible, thereby increasing the “Time On Site”, with the expectation that it will lead to more “Revenue”.

Everything looks good so far, right? Well, the singular problem in this case is that the owner did not consider a few data points that could potentially lead to a different conclusion.

Let’s add the following metrics to the mix now:
- Unique Visitors
- Visits Per Visitor
- Revenue Per Visit
- Revenue Per Visitor
With these metrics being added to the analysis, two additional inferences come to the fore:
- That “Revenue Per Visitor” is inversely proportional to “Time On Site”.
- That “Revenue Per Visitor” is directly proportional to “Visits Per Visitor”.
With these two inferences, another theory that surfaces is that if the site helps the user quickly find what he is looking for, thereby reducing “Time On Site” and make him come back often, thereby increasing “Visits Per Visitor”, it should lead to higher “Revenue Per Visitor”.
If this theory is correct, the owner will need to focus on improving the quality of content, clustering content around similar topics and perhaps build an internal search engine, to help the user find what he was looking for quickly.

Now that we have two contradicting theories, how do we decide which one is correct? The answer lies in asking for some more relevant data points and doing some more analysis. This is what separates good product managers from bad product managers.

Digging deeper, let’s start by looking at user cohorts by “Frequency Of Visit” and the “Revenue” generated. Let’s say this cohort shows us that “70% of Revenue” is generated by “Top 40% High Frequency Users”.
Let’s now look at the user cohorts by “Time On Site” and the “Revenue” generated. Let’s say this cohort shows us that “80% of the Revenue” is generated by “20% Users With Lowest Time On Site”.
With paretos being established by these two cohorts, we can now safely conclude that the second theory is correct and move ahead to increase the repeat rate of our users.
This also reminds us of the harsh reality that averages can be misleading and should be used very cautiously in any analysis.

Finally, I’ll leave you with this Dilbert strip as a parting thought.

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.