# An intuitive introduction to Boxplots

## A graphical display to better understand and interpret your dataset

--

I met this graphical representation for the first time at beginning of my statistics degree. At first impact, I liked it. I found it so simple and full of information. But at the same time, you can take for granted some knowledge, that is essential to understand this plot. For this reason, I am writing this tutorial that will focus on all the details that can escape your attention.

You are probably asking why you should use it. The first thing you should know for now is that it’s an efficient way to display all the characteristics of the data distribution, such as the overall shape, skewness, and symmetry. Moreover, it can show the distribution of a quantitative variable for each level of a qualitative variable.

# How is it built?

It’s a way to summarize your variable’s distribution based on five values: minimum, first quartile, median, third quartile, and maximum.

Let’s see the concepts composing this plot one by one. Given a set of observations in ascending order, we can calculate the five measures:

1. Minimum is the smallest data value. It excludes the points outside, called outliers. It’s equal to Q1–1.5*(Q3-Q1)=Q1–1.5*IQR.

--

--

Data Scientist | Top 1500 Writer on Medium | Love to share Data Science articles| https://www.linkedin.com/in/eugenia-anello