Data Analytics — Devil is in the details (Part 1)

In the era of Social Media and Internet Marketing, it comes as no surprise that data geeks are “gods”. It’s often been said that the sexiest job of the 20th century are Data Scientists. More so, these data scientists score a heck load more points for being tech savvy coupled with strong data science skills. Add to the broth the ability to innovate new ways of marketing and you might have a perfect concoction of what (yawn!) is Growth Hackers.

“In god we trust; all others bring data” W. Edwards Deming”

Among these fancy buzzwords (or at least someone made it easier to explain to our folks what we do for a living) is something that cannot be easily overlooked and, most importantly, easily learned. We might know that it takes 21 days to form a new habit. Well, it takes years to form a habit of identifying underlying patterns. Sorry “Analytics Crash Course” websites claiming it’s really easy.

Let’s try this:

If I had 2 children and at least one of them was a girl, what is the probability that the other is a boy?

Before I reveal the answer, (Bet you did a google search or if you like me to share your right answer on my blog, let me talk a bit about Analytics. You see, the trick is not to be an expert on number crunching alone using platforms like R, SAS or Python. Its not even knowing complicated algorithms and formulas that constitute an insurmountable headache in Machine Learning. The trick is to remember your basic math in school. Really!

Coming back to junior school math, most of the principles of analytics are stored procedures. The new wave is about making predictions from those silly math. E.g. How many of Credit Card holders have a likelihood to default in the next 3 months? Who has the most chance of buying a product recommendation from Amazon? What is the next (cute!) dog video that I have the most chance to click and share on? These are full-time careers of geeks who get paid a ton. And you can too.

How? Read this first. And its not easy.

If you reflect on school math and bucketize them on a Pareto scale, 80% of the topics are wired on 3 things:

Area
Angle
Error

That’s all you really need to know. Remember this when you are overwhelmed with all the algorithms and analytics thrown at you by data geeks (including myself). Now let’s regress each on an analytical plane.

01. Area: The extent or measurement of..

”Measurement” and “Extent” are super important when understanding the breadth and depth of data. Ask yourself: If you had to decide on purchasing a new home, what questions would you ask? Again the extent of decisions should lie on a Pareto scale i.e. 80% of your questions should encompass a local maximum of less than x questions. That could be, let’s say, 4 questions.

What is your budget?
What is the square footage you will accept at minimum?
What is your preferred location/s?
What lifestyle suits you? Etc etc etc

The extent is the web search and personal recommendations you will likely base your decisions on. The measurement is the priority you will place on the final outcomes of search and recommendations.

Key point of note here is: As one increases the extent by 1, the measurement will exponentially get diverse at time.

Measurement = Measurement at time 0 x (1+extent)^t

This also means that with each additional data point added to the pot, it will affect measurements across all the other data points preceding to new data point.

Your keen eye would be to not find those numbers but rather understand what those numbers are telling you within a decent extent and consumable measurement within a pre-defined time.

Hence, the simplicity to Area is to keep your extent and measurement simple and within an acceptable time. Now, if you had to capture the essence of extent and measurement in Analytics, you would want to understand the breadth and depth of the “Data”. And that’s where the second part comes in.

Takeaway: Understand the extent (breadth and depth) of data and set up the right unit of measurement so that it is actionable.

02. Angle: A particular way of approaching an issue..

As a consultant, I way always reminded the advantage of classifying problems into “Key Drivers”. What that really means is that to figure out the highest degree of regression to the cause of the problem. They could be “Value” drivers, “Business Drivers” or even “Risk” drivers. If you think of the home example above, the questions you asked are most likely “Key Drivers”. The correlation between Area and Angle is truly high. Each is dependent on the other. But don’t be confused with correlation itself.

“Correlation can be a bitch!”

Are your questions and subsequent decisions based on a data points which as a primitive level is most important? Income! Budget is high correlated with Income. The mortgage you take out will look at your disposable income. Goddamn! Your wife will ask “Can we afford this”? So, knowing that difference between cause-effect and correlation can really make you identify the right angles.

“Key Drivers” always need to be mutually exclusive yet somewhat conclusive. “

If you have a deep understanding of the business model your industry is in, you perhaps would have a heuristics approach to identifying “Key Drivers”. This holds good as long as you can question the extent and measurement by adding one more : What assumptions still hold good/have changed over last x years? This single question will set you apart from the data geeks. This alone will establish the strategic intent of the problem / opportunity your looking to solve. This will help you differentiate between Causality and Correlation.

Takeaway: Once you have the “Area” understood, establish the right angles by identifying the “Key Drivers” and Assumptions

03. Error: A mistake..

And its just that. Remember high school calculus: After the deduction of the differential equation, you had to put a “+c” as constant in the final answer? What that means is that there is still an alternative answer possible if underying conditions change. This is true even in life. All our decisions are subject to error. The randomness of outcomes cannot be predicted. Nate Silver in his fantastic book “Signal and Noise” talks about error in great detail and higher degree of importance.

“Nothing is true until there is an error in it.”

From our example of Home again, our error is perhaps many within and beyond our control. E.g. Is this the best option or have I over-looked a better one? Is the price right? What if my work location changes in future, then what? And these are what I call as “True” error. As long as they are appreciated as part of the underlying assumption (see Point 2), they don’t really harm the outcome. Its notable to understand the ranges of the error so as to not be surprised when they do occur. The error to worry and be cognizant about is “False” error. These are errors that we think might have an impact but does not really change the outcome. E.g. What would happen if her relatives stayed over for long? What would happen if we have a second kid. These are blessings. And basing long term decisions on them is not right. I shall cover this in a bit more detail in my second post (Part 2) but for now it would suffice to maintain and appreciate some error in data itself as well as decisions and outcome we seek.

Takeaway: Withing the scope of Area and Angle, figure out the Error and try to minimize it as much as possible.

By understanding the extent, defining the measurement, identifying the key drivers and figuring out as much as error as possible, you are all set. And it always pays to remember that the devil is in the details.

Speaking of which, my next post would be focused on deeper concepts and discussion around extent and measurement, errors and key drivers. This includes pattern recognition, derived state, experimentation and outlier measurement.

Oh! The answer to the question : If I had 2 children and at least one of them was a girl, what is the probability that the other is a boy?

Possible Outcomes

Girl/Girl
Girl/Boy
Boy/Boy
Boy/Girl

Since at least one of them is a girl, option 3 is out. Since boy happens to occur 2 (girl/boy, boy/girl) out of 3 options left, the chances are 2/3 which is 67%.

What are the basic principles one must know in the area of Analytics? I would love your suggestions.

Please subscribe to my blog www.marbble.com if you found this valuable. :) Better still, please share.

Data Analytics — Devil is in the details (Part 1)

01. Area: The extent or measurement of..

02. Angle: A particular way of approaching an issue..

03. Error: A mistake..

Written by Abi Bhalla 🚀