Measure of Variability

Ademola Ayobami
3 min readOct 14, 2021

--

As a software engineer, I have had my share of solving problems for organizations and became a Data Science enthusiast after understanding that data can help achieve so much more in solving problems in these organizations. While searching for how to integrate me with data science, I came across 10Alytics hosted by Efemena Michael and enrolled.

Let’s see an introduction to measuring variability.

All descriptive statistics are more or less about summarizing datasets which normally consist of many data by using a diagram/graph or just a few statistical measures. One group of such indicators is the measure of central tendency also called means. Characterizing a distribution by its average is of course a good thing to do however it doesn’t tell the full story so this is where another indicator like the measure of variability comes in.

The measure of variability describes how points in a given dataset differ from one another. It determines how far apart the data point appears to fall from the center.

To discuss variability in the context of a distribution of values, if the measure of variability has low dispersion, it indicates that the data tend to cluster tightly around the center which makes it ideal because it means you can predict better information about the population based on the given sample data. Low variability equals more consistency in a dataset. On the other hand, high dispersion indicates that they tend to fall farther away. High variability means that the values are less consistent so it is more difficult to make an accurate prediction.

Different ways of measuring variability.

  1. Range.
  2. The Interquartile range.
  3. Variance and
  4. Standard deviation.

Range

This tells us the spread of our data from the lowest to the highest value in the distribution. Also, it is the simplest measure of variability. It is defined as follows R = H — L, where R is the Range, H is the highest value and L is the lowest value.

This is the difference between the largest and smallest values in your dataset. It gives information about the spread of your data from the lowest to the highest value in the data set.

Range = highest value — lowest value

The interquartile range (IOR)

This is the middle of your dataset. It gives us the spread of the middle of your distribution. It can be defined as IR = Q3 — Q1, where IR is interquartile range, Q3 is the upper quartile and Q1 is the lower quartile.

Variance

This is the average squared difference of the values from the means. It is the square of the standard deviation.

Where: “Σ” is the summation, “ x” is the individual value in the dataset, and “N” is the Population size.

Standard Deviation

This is typically the difference between the data point and the mean. The average amount of variability in your dataset.

Where: “Σ” is the summation, “ x” is the individual value in the dataset, and “n” is the sample size.

The following are the steps on how to find the standard deviation.

  1. List each score and find their mean.
  2. Subtract the mean from each score to get the deviation from the mean.
  3. Square each of these deviations.
  4. Add up all the squared deviations
  5. Divide the sum of the squared deviation by n-1 (for a sample) or N (For a population)
  6. Find the square root of the number you found.

--

--