Statistics: Mean, Median and Mode

Measure of Central Tendency — Mean, Median and Mode in Statistics — Indepth formula applied using sample data and python program

Bala Murugan N G
Analytics Vidhya
4 min readMay 15, 2021

--

Measure of Central Tendency — Mean, Median and Mode

Why this Blog ?

Each and every blogs/tutorials telling their own aspects of Statistics and Mean, median and Mode concepts. Some of the Blogs may collapsing you to understand the concepts that I suffered a lot from my experience.

So, I took an interest on writing the “Clear Concepts Explained in Statistics” and also it would helpful for me to recall those concepts clearly for my future use.

Photo by carolyn christine on Unsplash

Introduction to Statistics

If you are learning any concepts, try to question you . .

What is it ?

Why it is important?

How it works? and

How to implement this?

If you can answer above Questions. . . You can teach any concepts in a simplest way.

Before we get into the Mean, Median and Mode concepts, we need to know about statistics and why it is important ?

What is Statistics : Statistics is a Branch of Mathematics, that transforms your data into useful insights for Decision Makers. . .

Image Credit : https://www.pngegg.com/en/png-purmy

Why Statistics is Important for Data Science ?

Statistics is one of the most important disciplines to provide tools and methods to find structure in and to give deeper insight into data, and the most important discipline to analyze and quantify uncertainty.

It’s a backbone for the Hypothesis Testing, Machine Learning, Deep Learning concepts etc. . .

Measure of Central Tendency

A measure of central tendency is one of the Descriptive Statistic that represents the center point of a dataset (i.e. Describes the data in a single value by identifying the central position)

Measure of Central Tendency is divided into 3 types

  1. Mean
  2. Median
  3. Mode
Image by Author

Mean

Mean is one of the measure of Central Tendency, which gives the average of the data(i.e. Sum of all the values divided by Total number of values)

Image by Author

Let’s take a sample data and get the output using numpy, statistics packages and using formula.

Here is the Python Code for Mean

# Importing packages
import statistics
import numpy as np
# Sample Data
data = [1,2,4,5,6,76,8,45]
# Using simple mean formula
mean = sum(data)/len(data)

print(mean)
# output of numpy package
print(np.mean(data))
# output of statistics package
print(statistics.mean(data))

Mean Notebook Snippets

Created by Author

If you see, all the package/ formula output shows the same results. . .

Median

Median gives the Middle value of the sorted data. It is mostly used in Outlier detection/removal and imputing missing values while doing data preprocessing in the data

Let’s take a sample data and try to find the median using simple steps

Image Created by Author

Here is the Python code for Median

import statistics
import numpy as np

data = [1,2,4,5,6,76,8,45]

# Using Formula without Python Packages
"""
If number of elements = odd - - - -> n/2
If number of elements = even - - → (n+1)/2
We can't use the (n+1)/2 exactly in coding. Because finding the position using float values gets error. So, I am slightly changing the formula for Even.
m1 = (n/2)th position
m2 = ((n/2) - 1) th position
"""
sorted_data_median = sorted(data)
print("Sorted Data:", sorted_data_median)
m1 = int(len(sorted_data_median)/ 2)
m2 = int((m1 - 1))
print(f"Position of the data: {m1} and {m2}")
print(f"Values in the Position: {sorted_data_median[m1]} and {sorted_data_median[m2]}")
median = (sorted_data_median[m1] + sorted_data_median[m2])/2
print("Median:", median)

# Using numpy package
print(np.median(data))

# Using statistics package
print(statistics.median(data))

Median Notebook Snippets

Created by Author

Mode

Mode is the most occurring value in the dataset. It is mostly used in deleting the maximum words occurred in NLP and imputing the missing values while doing data preprocessing in the dataset.

# Importing packages
import statistics
from scipy import stats
# Sample data
data = [1,2,3,4,4,5,6,7,7,7,7,6,6,4,3,2,1]
# Using Scipy Package
output = stats.mode(data)
print(f"The Number {output[0]} occured {output[1]} times")
# Using Statistics package
print(statistics.mode(data))

Mode Notebook Snippets

Created by Author

Conclusion

Finally We come to the end . . . We have seen the Mean, Median and Mode and their uses in Analysis and Data Preprocessing works. Using Mean, Median and Mode, we can see whether the distribution is Skewed or Not(Left Skewed and Right Skewed).

If you any doubt/ suggestions related to this topic, please post your comment in the comments section.

Keep Notifying me . . . to improve my writing skills by hitting the likes

Thank you all,

Bala Murugan N G

--

--

Bala Murugan N G
Analytics Vidhya

Currently Exploring my Life in Researching Data Science. Connect Me at LinkedIn : https://www.linkedin.com/in/ngbala6