Data Science is all about Judgement Calls
This blog post is inspired from a question that was posed in one of the Data Science chat rooms that I silently and sincerely stalk. It is a source of cool links and interesting conversation around data and its potential power.
The topic in question is how best to use aggregate functions to display any given data. Is it better to include statistical functions to color aggregates or does that obfuscate the end user to interpret the results accurately? This article that was shared along with the dilemma and it is good at outlining the perils of each kinds of aggregation strategy. But it still does not give you the gratification of coming to a conclusion about the perfect solution.
As I am beginning to dig deep into this and look at data and its reach not just as an enthusiast but more as a practitioner.
I think the question is more nuanced than a technical solution. Just as data science is. So what exactly is data science and who is a data scientist? This is an all too familiar and far too simplistic description that is thrown at this question. And I am an academician (still) at heart and I do not like simple answers.
I think this best describes the tools and skill set that data scientists are equipped with but the more distinguishing attributes are:
- Ability to frame the problem.
- Collect and process relevant data.
- Explore and Analyze the data.
- Last but definitely not the least: Data Storytelling
Communicating the results of the analysis to your stakeholders in a way that is comprehensive and compelling. This skill is critical and I feel often underrated. What distinguishes Hans Rosling from other Global health experts are not his statistical and data skills (although he has used D3JS spectacularly). One of his ted talks are given using boxes as data props! What makes him smashing is his data narrative. What makes Gapminder cool is and the judgement calls he has made to present the data in its most understandable form for the context at hand.
So coming back to the question at hand.
We need to understand the many ways a number can be arrived at and what informational compromises have been made to arrive at informational simplicity.
The WE in the quote in my opinion is not the end user or the business user. The WE is the data scientist, who needs to understand the business context in which a question or problem is framed. What decision are we facilitating and enabling the team to arrive at armed with data that is presented in the right context. Although there are some strategies to highlight the underlying complexities there can never be a one size fits all solution to this dilemma. This decision and this mantel falls on the data scientist and not the user.
Data science is all about making judgement calls!