Statistics Overview and Data types every data science enthusiast should know

Atul Sharma
Analytics Vidhya
Published in
3 min readNov 9, 2020

Knowledge of basics of statistics has always been and will be of umpteen importance in the data science domain. Also, if someone wants to delve into the buzz streams of the future like Machine Learning, Deep Learning and Artificial Intelligence, solid foundations of statistics are a must. Partial knowledge of statistics is not only harmful but its application is even worse — ‘a disaster’. Keep a watch on my upcoming/posted blogs where slowly and steadily we will move ahead in this journey of learning by questioning the need of plethora of statistical measures and justifying with relatable examples.

Let’s begin this journey with statistics overview

Two Branches of Statistics:

1. Descriptive Statistics

2. Inferential Statistics

Descriptive Statistics:

This branch of statistics enables us to describe a compact summary of the data in a meaningful way to derive insights at one glance. To explore any given data set , 1-number summary like the central tendency(Mean/Median/Mode) or a 2-number summary incorporating spread (Range/Variance/Standard Deviation) alongside central tendency or a 5-number summary which includes central tendency (Median), 1st Quartile, 3rd Quartile, Lower Extreme and Upper Extreme — (Box Plot) is considered. Further in the scope of descriptive statistics, there exist many more statistical measures to be aware of like Coefficient of Variation, Covariance, Quantiles, Skewness, Kurtosis etc. which will be discussed in detail later in the upcoming blogs.

Inferential Statistics:

This branch of statistics is all about making estimates of the population from the derived sample data. Conclusions made about the population are based on the established Central Limit Theorem, which will be discussed in detail later in the book. Hypothesis testing is one crucial tool of inferential statistics to accept/reject a stated belief and there are many means/methods to conduct the same based on the types of variables and scope. Each result we achieve in inferential statistics is an estimation of what we think population result would be (keeping in mind all the assumptions). Never confuse the results as the actual population statistic measures as they are impractical to achieve (only estimations are feasible because of resource constraints like time, efforts etc.). This is the only reason why the error involved in inferential statistics results is usually more than descriptive statistics.

Data Types Overview

(Image by author)

Qualitative Data comes under the non-parametric category and Quantitative Data comes under the parametric category. One more important point to be discussed is the ‘Levels of Measurement’ which enables us to decide which descriptive statistics measures are feasible to compute for a given data type. Now quickly we will look at the levels (Scale) of measurement:

(Image by author)

That’s it for this blog, keep an eye on the upcoming blogs which would be covering all “must know” aspects related to statistics in a simple & easily interpretable form.

Thanks!!!

--

--

Atul Sharma
Analytics Vidhya

Partially Frequentist, Partially Bayesian, Fully Futuristic.