The ABCs of Summary Stats: Mean, Median, and Mode

Making Sense of Data with Mean, Median, and Mode

Humberto Rendon
Byte-Sized Data
3 min readApr 26, 2023

--

Data doesn’t always look like a squeaky clean spreadsheet. Most of the times it comes as some form of summary statistics. If you don’t know what that means, let me give you a brief example. Let’s say you download a dataset about movies. Instead of looking through the whole dataset and trying to infer things, with summary statistics you could quickly learn that most of last year movies sucked. The most common summary statistics are mean, median, and mode (yes, here we go again).

Mean (average)

The sum of all the numbers, divided by the count of all your numbers. The purpose of the mean is to give you a sense of what every number as an individual contributes to the whole series.

For example, let’s say you have to give a presentation about selling lemonade at your younger brother’s school. You ask your brother for the ages of the people who are going to be there, and he tells you ages from 7 to 9. You could count and consider every age individually, but you know that on average you’ll be talking to 8 year old kids, so you end up simplifying your presentation.

Median

The median is the midpoint of your data. Getting the median is very important because it’s less sensitive to outliers than the mean. But what does this mean? (ba dum tss) This means that if you have extreme values in your data, you won’t go in the wrong direction.

For example, let’s say you’re looking at salaries in a company. If there’s an employee with a very high salary, getting the average will make you think everybody else’s salary is higher than it actually is. But if you use the median, this mythical high earner won’t affect everyone else’s salary.

Remember that to get the median, the data has to be in order

Mode

The most common number in the dataset. If every number in your dataset repeats the same amount of times, then we can say that there is no mode. Knowing the mode can help us identify outliers and it’s specially useful with categorical data (when there is no particular order).

Common Misconceptions

Even though these are basic statistics, sometimes we are still silly billies. For example, let’s say we have the numbers 1, 2, 3, 3, 3, 2, 1, 2, 1, 3, 100. The mean would say that our average is 11. This number not only doesn’t appear in our data, but it’s far from what we might find in that dataset. The median on the other hand would say that the middle point of our data is 2, which makes sense considering most of our numbers are 1, 2 and 3. The mode would tell us that the most common number is 3, which also makes sense. This is a very simple example, but it gets weird with real data.

What would you understand if I told you that the average number of people in a household is 2.5? It’s very common for people to use the average as the midpoint, because I guess it’s “intuitive” (half of the numbers must be below and half above). When using these measures, always second-check the logic behind your findings.

--

--