Understanding Central Tendency: Unraveling the Core of Data Analysis

Introduction:

Kmshilpamurali
7 min readOct 30, 2023

In the world of data analysis, understanding central tendency is like having a secret key to unravel the mysteries hidden within datasets. These measures are the core of statistics, guiding us to the center of our data, where valuable insights are waiting to be discovered. Join us on a journey through the world of central tendency as we explore measures like the mean, median, mode, and more.

Mean: The mean, also known as the average, is calculated by adding up all the values in a dataset and dividing by the number of data points. It is represented as:

Mean (μ) = (Sum of all values) / (Number of values)

Median: The median is the middle value in a dataset when the values are arranged in ascending or descending order. If there are an even number of data points, the median is the average of the two middle values. It is a robust measure of central tendency and is not affected by extreme values.

For datasets with an odd number of data points:

Median = (n + 1) / 2

For datasets with an even number of data points:

Avg. of (n+1)/2 and n/2

Mode:

The mode is the value that appears most frequently in a dataset.

Example: representing AWS (Amazon Web Services) costs for a set of companies over a few months We’ll calculate the mean, median, and mode for these AWS costs.

Calculation:

1. Mean (average) AWS Cost for All Companies:

2. Median AWS Cost for All Companies:

For Median arrange the data in either ascending or descending order. Here we are arranged in ascending order: 4500, 4800, 5000, 5100, 5500, 6000, 7000, 7200, 7500.

3.Mode

Mode AWS Cost for All Companies: In this dataset, there is no mode because no AWS cost value appears more than once.

From the calculated measures of central tendency, we can conclude that, on average, AWS costs for these companies are around $5844 The median AWS cost, which is $5500, provides a middle point of reference, suggesting that many companies have AWS costs around this value. Since there is no mode, we can infer that there is no specific AWS cost that is the most frequent among these companies.

Other Measures of Central Tendency:

1.Geometric Mean

The geometric mean is a measure of central tendency that is used when dealing with data that involves multiplication or exponential growth, such as investment returns or growth rates.

For example, if you have annual investment returns of 10%, 5%, and 20% and you want to know the average annual return that would produce the same overall result over those years, you’d use the geometric mean.

Geometric Mean = (x₁ * x₂ * … * xₙ)^(1/n)

Example :

Geometric mean for investment returns using Excel. In this example, we’ll consider a hypothetical investment with annual returns over a 5-year period.

2.Weighted Mean

The weighted mean is a variation of the mean (average) that takes into account the importance or significance of each data point by assigning specific weights to them. The weighted mean provides a way to calculate a central value that reflects the relative contributions of different data points based on their assigned weights.

For example Suppose you are a sales manager for a company that sells multiple products, and you want to calculate the weighted mean sales price for these products based on their respective sales volumes.

Weighted Mean formula -

Weighted Mean = Σ (w_i * x_i) / Σ w_i

Where:

  • Weighted Mean is the calculated result.
  • Σ represents the sum of all values.
  • w_i represents the weight associated with each data point x_i.

To use this formula in Excel,

=SUMPRODUCT(A2:A5, B2:B5) / SUM(B2:B5)

The weighted mean sale price for the products, taking into account their respective sales volumes, is approximately $46.7. This means that, when considering the varying sales volumes of each product, the average sale price is $46.7

3.Quartiles

Quartiles divide a dataset into four equal parts, each containing 25% of the data. The quartiles include the first quartile (Q1), the median (Q2), and the third quartile (Q3). They are used to understand the spread of the data and are commonly used in box plots.

Formula —

Q1 (First Quartile):

(N + 1) * 0.25, where N is the total number of data points.

Q2 (Second Quartile, Median):

  • If the number of data points (N) is odd, Q2 is the value at the middle position.
  • If the number of data points is even, Q2 is the average of the two middle values.

Q3 (Third Quartile):

(N + 1) * 0.75, where N is the total number of data points.

Example: Here are the exam scores:

75, 80, 85, 88, 90, 92, 94, 96, 98, 100, 100, 100

Excel formula ,

=QUARTILE(range, quart)

Interpretation:

  • Q1 (First Quartile) is 85.75. This means that 25% of the students scored 85.75 or lower on the exam.
  • Q2 (second quarter) is 93, representing the median. Half of the students scored 93 or lower, and half scored 93 or higher.
  • Q3 (Third Quartile) is 99.5. This means that 75% of the students scored 99.5 or lower on the exam.

Now, let’s calculate different quantiles for this dataset.

  1. QUANTILE Function:

This function calculates a quantile without specifying whether it’s inclusive or exclusive. By default, Excel uses the inclusive method.

Example 1: Calculate the Median (50th Percentile) Using QUANTILE:

=QUANTILE(A2:A7, 0.5)

2. QUANTILE.INC (Inclusive Quantile):

This function calculates quantiles while including the provided data values.

Example 2: Calculate the First Quartile (Q1, 25th Percentile) Using QUANTILE INC:

=QUANTILE.INC(A2:A7, 0.25)

3.QUANTILE.EXC (Exclusive Quantile):

  • This function calculates quantiles without including the provided data values.

Example 3: Calculate the Interquartile Range (IQR) Using QUANTILE.EXC:

=QUANTILE.EXC(A2:A7, 0.75) — QUANTILE.EXC(A2:A7, 0.25)

QUANTILE.EXC will calculate the 75th percentile (Q3) and 25th percentile (Q1) and then find the difference, giving you the interquartile range (IQR).

What is the interquartile range (IQR)?

The interquartile range (IQR) is a measure of statistical dispersion or spread that is used to describe the range within which the middle 50% of the data values fall.

The Interquartile Range (IQR) and the value at the 50th percentile (often referred to as the median) are related measures, but they are not the same. Here’s the key difference:

  1. IQR (Interquartile Range):
  • The IQR is a measure of the spread or variability of the middle 50% of the data.
  • It is calculated as the difference between the third quartile (Q3, the 75th percentile) and the first quartile (Q1, the 25th percentile).
  • The IQR is a measure of statistical dispersion and provides insight into how data is spread out around the median.

2. Median (50th Percentile):

  • The median represents the middle value of a dataset when it is arranged in ascending or descending order.
  • It is the value below which 50% of the data falls.
  • The median is a measure of central tendency and represents the midpoint of the data.

4.Percentiles:

Percentiles divide a dataset into 100 equal parts, each containing 1% of the data. They provide a way to compare a specific data point with the entire distribution. For example, the 90th percentile represents the value below which 90% of the data falls.

Percentile = (P / 100) * (N + 1)

Where:

  • Percentile is the desired percentile you want to calculate (e.g., 25th percentile, 50th percentile, 75th percentile).
  • P is the percentile rank as a percentage (e.g., for the 25th percentile, P = 25).
  • N is the total number of data points in the dataset.

Excel formula:

=PERCENTILE.INC(array, k)

Difference between percentile and quantile:

5.Midrange

The midrange is a relatively simple measure of central tendency, calculated as the average of the minimum and maximum values in a dataset. It provides a quick way to understand the “middle” of the data in a more intuitive sense.

Here’s an example to illustrate how to calculate the midrange:

Formula ,

Midrange = (Minimum + Maximum) / 2

Suppose you have a dataset representing the daily high temperatures (in degrees Fahrenheit) for a particular city over the past week:

So, in this example, the midrange of the daily high temperatures for the week is 74.5 degrees Fahrenheit. This represents the “middle” temperature value in the dataset, as it’s the average of the lowest (67°F) and highest (82°F) temperatures over the week.

Conclusion:

In summary, central tendency measures offer valuable tools to understand and describe the center of data distributions, and their selection plays a crucial role in effective data analysis and decision-making. Whether you're managing a business, conducting research, or simply interpreting data, central tendency measures are essential to gaining a clear and concise understanding of your dataset.

--

--