Data Science vs Statistics?

Amit Kumar Garg
Blog By Abnormal Doctors
3 min readMay 12, 2024

Why do we need data science when we have statistics for centuries?

Consider this definition of a data scientist:

Data Scientist (n.): Person who is better at statistics than any software engineer and better at software engineering than any statistician.

“Data science has become a fourth approach to scientific discovery, in addition to experimentation, modeling, and computation,” said Provost Martha Pollack

The website for Data Science Initiatives gives us an idea about data science:

“This coupling of scientific discovery and practice involves the collection, management, processing, analysis, visualization, and interpretation of vast amounts of heterogeneous data associated with a diverse array of scientific, translational, and inter-disciplinary applications.”

Let’s now dive into the definition of statistics:

Statistics:

“Statistics” means the practice or science of collecting and analyzing numerical data in large quantities

Misconceptions:

  1. Big Data: Big data refers to the large amount of data currently available in the information age through social media. But statisticians were already comfortable handling big data on the scale of a country census.
  2. Skills: It is not that statisticians lack the skills to handle big data or can not use computer science technology to handle data on a larger scale.

So, how data science came into being:

In 1962, John Tukey, Professor of Mathematics at Princeton (this university has produced or associated with 41 Nobel Laureates!!), published an article saying that he is rather more interested in “data analysis” than inferential statistics where we draw inferences from data where he defined data analysis as:

“my central interest is in data analysis, which I take to include, among other things: procedures for analyzing data, techniques for interpreting the results of such procedures, ways of planning the gathering of data to make its analysis easier, more precise or more accurate, and all the machinery and results of (mathematical) statistics which apply to analyzing data”

Last 50 years:

Over the last 50 years, computational environments for data analysis have been developed, of which famous ones you must have heard, like R, Stata, and SPSS, while others are S, ISP, and SAS.

These computation environments are game changers. What used to happen earlier was that a scientific paper's statistics and data analysis were more theoretical. Now with the help of these programs, one can easily tweak code, draw much more insights, and further improve upon data analysis. Thus, people came to be convinced that a scientific approach is needed for data analysis.

So, in summary, Data Science focuses on:

  1. Analysis and Prediction or Prediction modeling: Processing the data to predict how future responses can look like given a set of raw data. e.g., How the incidence of lung cancer would vary if the taxes on cigarettes were increased by 10 %?
  2. They prioritize prediction and are effectively silent about the underlying mechanism that is generating the data as long as the accuracy is great.
  3. There have been many data science competitions where a publicly available dataset is provided to make predictions, and the model with the best accuracy wins. This kind of framework is known as the Common Task Framework (CTF).

The CTF paradigm has been so successful that many of the automation we take for granted, such as Google translate, smartphone touch ID, and smartphone voice recognition, are derived from it.

and Statistics deal with:

  1. Modeling or generative modeling: How are the numbers behind lung cancer and cigarette taxes linked?
  2. Statistics belive that there is a truly best model that is generating the data.

The graph below shows how many people search on google for terms like Data Science and Statistics. You will appreciate that the trend for data science is increasing

Red — Statistics, Blue — Data science

TL;DR

Data science is a continuation of statistics that focuses more on data analysis and predictive modeling by taking the help of computer science

Appreciation:

If you liked our post, please clap for us below and follow our medium blog.

--

--

Amit Kumar Garg
Blog By Abnormal Doctors

Building CARPL.AI | Research Fellow IIT Delhi | MBBS AIIMS, New Delhi | AI enthusiast | Aspiring physician-scientist | Rookie Biodesigner |