De-waffling Data Science

Ben Houghton
Data & Waffles
Published in
4 min readMay 10, 2019

Putting Science back into Data Science

Before you read what follows, go and ask three Data Scientists (or other data professionals) how they would define Data Science (or even google what Data Science is and look at the top 3 results). I am going to bet that a few things will happen:

  1. Machine Learning and Big Data will be mentioned, or alluded to, at least twice
  2. Programming will be mentioned by at least two people
  3. Some explanation of how data is turned into action will be given
  4. At least one will emphasise the ‘advancedness’ of the work they do.
  5. (for bonus points) Someone presents a maths/programming/business Venn diagram as part of their definition

The first thing you may notice is the range of concepts which can form the definition. Data Science may indeed be defined by:

  1. The methods employed
  2. How they are employed
  3. The benefit it brings
  4. The complexity of the work

The second thing you’ll notice (except possibly in a few rare exceptions) is that the word ‘data’ will be part of each definition, but the notion of ‘science’ will be at most implicit and likely not referenced at all. Indeed, the most common question I get when discussing my job is ‘where does the science actually come into it?’. In my view, the lack of ‘science’ in these definitions gives the feel that the term ‘data science’ is more of an all-encapsulating buzzword than a true discipline.

Before I go on, I want to get on the record that I don’t think there is necessarily a problem with data science being defined differently by different people — different teams and organisations have different needs with their data and it’s often helpful to have a single title which many people who have the base skill set to do the role can hold.

I’m going to argue that it is the scientific method which can best help distinguish Data Science from other technical or data disciplines (analytics, engineering, software development etc). The scientific method — forming a hypothesis; collecting the data to test it; analysing the data and communicating and acting on the conclusions — is central to the success of data science as a discipline.

It just so happens that the definitions we quoted earlier are compatible with the scientific method being central to Data Science. Data Science will inherently be ‘advanced’ as it requires a complex mix of skills: gathering data in a rigorous fashion, choosing and executing the right methods for testing hypotheses and presenting insights to a range of audiences. Turning data into action is a core part of this process. We may also highlight, as in most other posts on this topic, that, given the sheer amount of data we have available to us, programming skill and even expertise in big data will be essential, and machine learning may certainly have to replace some traditional methodologies to be able to test the hypotheses powerfully enough.

However, I wonder if the scientific method alone is enough to distinguish Data Science from other disciplines — many data professionals who refer to themselves as analysts, for example, certainly follow this process carefully. I expect what follows will differ from person to person and team to team, but my team have added two additional criteria to our data scientist definition to best encapsulate the work that we do.

We first suggest that the work we do is categorised as ‘Change’ rather than ‘Run’ — we work on projects which focus on changing our organisation rather than simply contributing to existing processes. This may seem subtle at first, but ensures that innovation forms a core part of what we do. It becomes an especially good fit when we note that many data scientists come from a research background and so can often offer this element of disruption. In addition, we also position ourselves as the builders of products and services. In practice, this attracts us to a similar set of projects to the innovation criterion above, but highlights the fact that the work we do is reusable and sustainable.

So this is our definition:

Data Science is the process of extracting value from data using the scientific method. This is used to form innovative products and services to change how the business operates.

Thus we have a brand new Venn Diagram to rival the traditional maths/programming/business one. The latter Venn Diagram remains an important and natural corollary to our main definition rather than being the definition itself.

--

--

Ben Houghton
Data & Waffles

Principal Data Scientist for Analytical Innovation at Quantexa