Understanding Measures of Central Tendency: Mean, Median, and Mode

Nitesh Addagatla
4 min readSep 12, 2023

--

What are the Measures of Central Tendency? Explain Mean, Median, and Mode, Descriptive Statistics, and When to Use Mean vs. Median vs. Mode. Python Code for Central Tendency, Outliers, Central Tendency, Data Distribution, Statistical Concepts, Quantitative Analysis

In the realm of data analysis and machine learning, understanding the central tendencies of a dataset is crucial. These central tendencies help us summarize data, identify patterns, and make informed decisions. In this blog post, we’ll delve into three key measures of central tendency: Mean, Median, and Mode. I will explain what each of these terms means and when to use them. Additionally, I’ll demonstrate how to calculate them using Python and the Pandas library.

What are Mean, Median, and Mode?

Mean:

The mean, also known as the average, is the sum of all data points divided by the total number of data points. It represents the “typical” or “average” value in a dataset. Mathematically, it can be expressed as:

Mean Formula

Where:

  • xi​ represents each data point.
  • n is the total number of data points.

Median:

The median is the middle value in a dataset when it is sorted in ascending order. It is not affected by extreme values (outliers) and is useful for datasets with skewed distributions. If there is an even number of data points, the median is the average of the two middle values.

Mode:

The mode is the value that occurs most frequently in a dataset. A dataset can have one mode (unimodal) or multiple modes (multimodal). It is especially useful for categorical data (text data).

I have more blogs all related to Data Science and Data Analytics. I’m sure that you’ll learn something new in them, have a look: Click Here.

When to Use Each Measure?

The choice of which measure of central tendency to use depends on the nature of the data and the objectives of your analysis:

  • Mean: Use the mean when dealing with data that is continuous and normally distributed. It provides a good representation of the central value in such cases. However, the mean is sensitive to outliers, so it may not be suitable for datasets with extreme values. (First eliminate outliers and then use Mean if needed)
  • Median: Use the median when your dataset contains outliers or has a skewed distribution. It is robust against extreme values and provides a better representation of the central value in such cases.
  • Mode: Use the mode for categorical data or when you want to identify the most frequent value in a dataset. It is valuable for understanding the most common category or class.

Please consider following me on Medium.com if you find this blog useful and also for all the Data-related blogs. Also, I feel encouraged to write a blog every day: Click Here. Thank you.

Examples:

Mean:

Let’s say we have the following dataset representing the ages of a group of individuals: [25, 28, 30, 32, 40, 45, 60]. To calculate the mean age, we add up all the ages and divide by the total number of individuals (7):

Median:

Consider the dataset of monthly salaries for a small company: [3000, 3500, 4000, 5000, 10000]. To find the median salary, we first sort the data:

Sorted dataset: [3000, 3500, 4000, 5000, 10000]

Since there is an odd number of data points, the median is the middle value, which is 4000.

Mode:

Imagine a dataset representing the favorite colors of a group of people: [“Blue”, “Red”, “Green”, “Blue”, “Blue”, “Yellow”]. In this case, “Blue” is the Mode because it occurs more frequently than any other color.

I am trying my best to explain each and every topic in a more beginner-friendly way. Read my other blogs: Click Here.

Python Code Example:

Now, let’s create a small Python code snippet using the Pandas library to generate dummy data and calculate the mean, median, and mode.

import pandas as pd

# Create a dummy dataset
data = {'Age': [25, 28, 30, 32, 40, 45, 60]}

# Create a Pandas DataFrame
df = pd.DataFrame(data)

# Calculate mean, median, and mode
mean_age = df['Age'].mean()
median_age = df['Age'].median()
mode_age = df['Age'].mode().values[0]

print(f"Mean Age: {mean_age}")
print(f"Median Age: {median_age}")
print(f"Mode Age: {mode_age}")

This code generates a DataFrame with ages and calculates the mean, median, and mode of the ages in the dataset.

To Conclude,

Understanding the measures of central tendency — mean, median, and mode — is fundamental for data analysis and machine learning. Choosing the appropriate measure depends on the characteristics of your data and your analytical goals. By utilizing these measures effectively, you can gain valuable insights and make informed decisions based on your data.

〰️〰️〰️ Thank you for reading the post, hope you find it useful! 〰️〰️〰️

😄😄 You can contact me on LinkedIn and follow me on Medium 😄😄

--

--

Nitesh Addagatla

Your go-to source for Data Science insights. From hands-on projects to handy tips, I'm here to simplify the complex. Let's explore the world of data together!