Very short history of Data Science: Why now? (and no, is not the same as Statistics!)

Florian Huber
5 min readApr 8, 2024

--

This blog post is largely based on my draft text book “Introduction to Data Science (for not-yet-scientists)” which you can find online.

While the term most likely first appeared in the 1980s, it did not gain too much attention for a long time. 12 years ago, the terms “data scientist” or “data science” were still not very well known to the public, so that journalists could get quite some attention by calling data science the “sexiest job of the 21st century”. Those times seem long gone, and data scientist has become an increasingly common job title.

But if you ask 10 practicing data scientists what data science is, you might get very different answers. Worse still, there’s not even a consensus on whether data science is a field of its own, a technical approach, a mindset, or maybe simply a new, fancier term for statistician.

Let’s start with the later, because that is the easiest to address. No, data science is not the same as statistics! Even though data science was suggested by some (e.g., Jeff Wu) as a new term for statistics in the 1980s and 1990s, most statisticians and data scientists nowadays seem to agree that both terms show important overlap, but do mean clearly different things (see also [1]).

But what is Data Science?

When teaching, but also when presenting part of my work to various people, I often have to give a simple and brief intuition of what data science is. In a compact way I would say:

Data Science is the art of gaining and communicating insights from complex data through digital techniques.

Many quantitative scientists could also argue that they often do exactly this. They aim to learn new things about the world from data. And the use of digital tools is also clearly no longer a significant difference. However, this does not argue against a field called “Data Science,” but rather only says that many quantitative scientists nowadays are also to some extent data scientists. They even have to be if they want to keep up with the state of the art in their fields, as many research areas are currently undergoing rapid change due to the widespread adoption of new digital techniques such as machine learning approaches.

Beyond the short definition of data science mentioned, opinions on what data science exactly is, converge a bit. Frequently this simply depends on the respective application area. Data science in consulting and business often means something different than data science in a more academic environment. However, in most cases, everyone can at least agree on a Venn diagram that is very often used in introductions in this — or slightly modified — form: Data science as the intersection of Digital Techniques (digital tools/methods), Statistics, and Domain Expertise.

Venn diagram to indicate the intersection of fields for data science. Here with a few examples to illustrate what falls into the different areas.

Data is nothing new. So why now data science?

Data has been a cornerstone of human understanding for millennia — from ancient civilizations keeping records of harvests and astronomy, to modern businesses tracking sales and performance. It’s clear that data in itself is not a new concept. However, the emergence and ascendancy of data science as a discipline is a relatively recent phenomenon. So, why now?

The prominence of data science in today’s world can be attributed to several concurrent developments:

(1) The exponential increase in the volume of data generated. Thanks to digitalization and the rise of the Internet, mobile devices, and IoT (Internet of Things), we are producing data at a previously unimaginable scale. This big data presents both a challenge and an opportunity — the challenge being how to handle and process this vast amount of information, and the opportunity being the valuable insights that can be gleaned from it.

This is accompanied by an increased recognition of the importance of data-driven decision-making across diverse sectors. Various industries, governments, and institutions have realized that leveraging the power of data can lead to increased efficiency, better decision-making, and a competitive advantage.

The concurrent developments leading to Data Science. Images of Thomas Bayes and Carl Friedrich Gauß taken from Wikipedia. For Thomas Bayes it is not even fully certain that the image actually shows him. Image taken from [2], CC-BY 4.0 Florian Huber.

This existence (and appreciation) of larger and larger amounts of data can be seen as a substrate for the rise of data science, but it really needed a combination of several other developments to be able to properly work with such data (see Figure).

(2) The evolution and expansion of statistical methodologies have been a key driver. Statistics provide the foundational principles and techniques for analyzing data, making inferences, and predicting future trends. In the era of big data, classical and modern statistical techniques form the backbone of most analyses in data science.

(3) The strides we’ve made in data handling capabilities have greatly facilitated the rise of data science. This obviously includes the drastic advancements in computational power and storage capabilities that made it possible to collect, store, and analyze these massive datasets. But this also includes many developments from computer science, such as databases. Just a few decades ago, collecting, storing, and analyzing the vast amounts of data we deal with today would have been unimaginable, let alone impractical.

(4) There has been significant progress in the field of algorithms, which also includes machine learning. It is algorithms, which are at the heart of nearly every tool that we use as data scientists for understanding and interpreting data. This can range from optimization methods dating back more than 200 years (e.g., least square method) all the way to current deep learning approaches. These advancements have opened up new possibilities for predictive analytics, automation, and artificial intelligence.

(5) Lastly, the often-underestimated field of data visualization has seen revolutionary advancements. Effective data visualization makes complex data more comprehensible, accessible, and actionable. The development of powerful visualization tools enables us to present data in a visually compelling manner that fosters understanding and drives informed decisions.

So, while data is not new, the volume of data, our ability to process it, and the recognition of its value, are. These changes have given rise to the burgeoning field of data science, marking a new era in our relationship with data.

References:
[1] Hassani et al. 2021, https://www.sciencedirect.com/science/article/abs/pii/S0040162521005448

More + Cite this work:
This blog post is largely based on an introductory chapter in the following text book I work on:

[2] Huber, F. (2024).
Introduction to Data Science (for not-yet scientists).
v0.13, 2024, Zenodo. https://doi.org/10.5281/zenodo.11190827

All code to create the book can be found on GitHub.
The rendered version of my book can be found here.

Let me know if this was helpful, or if you have any remarks or suggestions!

--

--

Florian Huber

Professor for Data Science at University of Applied Sciences Düsseldorf | research software engineer | former biological physicist | former chocolatier |