Statistics for Data Science 101 Series — An Overview

Published in

Analytics Vidhya

3 min readApr 10, 2024

Following the Data Analytics 101 Series, the Statistics 101 Series will delve into a burst of articles that explain the usage and importance of statistics in Data Science. I will try my best to explain concepts in simple words as and when I learn them! Let’s dive in!

The What

What is statistics?

The practice or science of collecting and analyzing numerical data in large quantities, especially for the purpose of inferring proportions in a whole from those in a representative sample.

Keeping the “Official” definition aside, statistics is a way of inferring meaning from a large dataset.

although the word statistics seems like a generalized word, there are branches within it. There are 2 major types of statistics — Descriptive and Inferential.

Descriptive Statistics:

Descriptive statistics presents methods and techniques to summarize datasets and present a high-level overview of the entire dataset. This method helps in getting an insight into the type and character of the dataset we are dealing with.

This branch of statistics uses techniques such as Measure of central tendency — Mean, Median, Mode, and variability — Range, Dispersion, and Variance to summarize the data and techniques like Charts, Graphs, and Tables to visualize them.

An example of differential statistics includes the summarization of a country's population over a period of time.

Inferential Statistics:

Inferential statistics deals more with probability and prediction of the population based on the samples. The idea of inferential statistics is to derive certainty and conclusions about the population based on the characteristics of the sample.

An example of inferential statistics is the prediction of returns of an asset over a longer period based on its return over a sample timeframe.

Inferential statistics include numerous techniques such as One sample test of difference/One sample hypothesis test, Confidence Interval, Contingency Tables, and many more.

Types of Data in Statistics

you can come across multiple types of data while running through a dataset.

They can be categorized into

1. Qualitative Data:

Data that can be categorized or that can act as a label is called qualitative data.

This can further be divided into

Nominal: Nominal data is represented using names and does not take numerical values.

Ordinal: Ordinal data also takes names to represent values but has a specific order that conveys meaning.

2. Quantitative Data:

Quantitative data takes numerical values to represent a data set.

This can further be divided into

Discrete: Data that can be counted and have a fixed value (Example: Population).
Continuous: Data that does not have a fixed value and that has a range (Example: Temperature).

Formulae

below is a table that contains a few basic formulae that can help in the calculation of various statistics.

Conclusion

In this article, we covered the basics of statistics including the types, formulae, and the types of data we come across in a dataset. In the next one, we will deep dive into descriptive and understand what it comprises of.

Hope you liked my article! Do share your thoughts and inputs and I’ll try to have them answered to my knowledge in future articles!

Happy Learning!

Check out my other articles on Blockchain and Machine Learning/Deep Learning. Let me know about any other topics to cover in the future!

Catch my previous articles here 👇