As data science tries to establish itself as a profession, data scientists need to develop general methods of becoming subject matter experts.
One of the earliest uses of the phrase Data Science was by Bill Cleveland in a talk intended to galvanise statisticians to use computers more effectively in order to better listen to what data was saying. Eighteen years onward, the phrase ‘data science’ has taken on a new lease of life having been re-launched and re-purposed.
In the new paradigm, the idea of modeling takes a larger role, and those models are meant to be useful, and to be useful they must be used. Therefore, where the need before was to move from a theory centred data analysis to data centred analysis, it is time to put the customer and their problems at the centre of data science.
Data centred analysis is analysis that lets the data speak for themselves, rather than attempting to impose a structure from outside that forces the message to come from a particular direction. In this sense, it is often identified with using non-parametric methods and visual methods rather than statistics that assume a particular probability distribution as a starting point.
A customer centred approach starts from what the customer wants, and doesn’t assume any data. To do this requires the data scientist to move from being an expert on statistics and programming to being an expert on the user and their context. More than that, it requires a change in mindset from being someone who discovers ‘what the data is saying’ to being someone who improves their customer’s life.
To achieve this mindset, there also needs to be a change in attitude around what are the most important skills and knowledge for a data scientist. The famous Drew Conway Venn diagram puts data science at the intersection of statistics, computer science and subject matter knowledge —Of course, there is some truth to the idea that there is different knowledge required to understand different kinds of businesses and organisations. An energy retailer is not identical to a FMCG company.
What you can teach to a general audience, however, are ways of understanding those organisations, as the reality is that the kinds of problems people seek to address by applying data science tools are similar no matter kind of organisation they occur in. You can learn to understand these common problems without applying unreasonable effort.
Similarly, the problem of understanding risk drivers and how to use modelling output to better manage risk also generalises across a variety of business domains which use data science to model risk — from credit risk and other financial forms of risk through to safety and industrial risk modelling.
Data science education needs to move beyond the narrow pursuit of the most technological advanced machine learning techniques possible. At this point in the development of data science, there should be more room to move from the theoretical aspects of statistics, or the technical aspects of programming to these applied problems.
The knowledge of the best general approach for data science in similar contexts, and the skill of recognising similar contexts when they’re in front of you are crucial skills for data scientists who want to be taken seriously as professionals.
Robert de Graaf’s book, Managing Your Data Science Projects, is out now through Apress.