8 Best Seaborn Visualizations
Hands-on statistical plots with Seaborn using the penguin dataset.
To perform a project in data science, you first need to understand data. Data visualization is one of the best ways to understand data. Matplotlib and Seaborn in Python are generally used for data visualization.
In this blog post, I’m going to cover the following topics:
- What is Seaborn?
- Scatter plot
- Histogram
- Bar plot
- Box plot
- Violin plot
- Facet grid
- Pair plot
- Heatmap
Let’s dive in!
What is Seaborn?
Seaborn is a Python library for data visualization built on Matplotlib. Matplotlib is used to plot 2D and 3D graphs, while Seaborn is used to plot statistical graphs. Because Seaborn builds on Matplotlib, you can use these two libraries together to create very powerful visualizations.
You can install the Seaborn with the following command:
pip install seaborn
When you install the Anaconda, Seaborn is installed automatically. After installing Seaborn, we need to import this library to use it. Let’s import Seaborn:
import seaborn as sns
With Seaborn, you can easily load some famous datasets used for data science. In this post, I’m going to use the palmer penguin dataset in Kaggle, which is used as an alternative to the iris dataset.
Let’s load the penguin dataset with Seaborn.
data = sns.load_dataset("penguins")
Let me show the first five rows of the dataset.
data[:5]
Let’s see the structure of the dataset.
data.shape
#Output:
(344, 7)
Seaborn has some themes you can use. You can control these themes with the set_theme
method. Let’s control themes with the rc
parameter.
sns.set_theme()
# For the image quality of the graphic.
sns.set(rc={"figure.dpi":300})
# For the size of the graphics
sns.set(rc = {"figure.figsize":(6,3)})
Now, let’s go deep into the statistical plots.
1- Scatter Plot
The best technique for understanding data is the scatter plot. The scatter plot is used to display the relationship between variables. Let’s see the scatter plot of culmen lengths and depths by penguin species.
sns.scatterplot( x = "bill_length_mm",
y = "bill_depth_mm",
data = data,
hue = "species")
As you can see, the length of the culmen is on the x-axis, and the depth of the culmen is on the y-axis. You can see how the species differ from each other from this scatter plot
2. Histogram
The second type of plot I’m going to show is histogram. Histogram shows the distribution of the data. You can use the histogram plot to see the distribution of one or more variables. Now let’s see the histogram of the flipper length using the histplot
method.
sns.histplot(x = "flipper_length_mm", data = data)
Note that the histogram calculates the number of observations that fall within the intervals. You can also flip the plot with y
parameter.
sns.histplot(data=data, y="flipper_length_mm")
You can control the width of the rectangles in histogram the bindwidth
parameter. Let me show this:
sns.histplot(data=data, x="flipper_length_mm", binwidth=3)
You can also add a kde, which represents the probability distribution curve, to the histogram plot. Let me show that.
sns.histplot(data=data, x="flipper_length_mm", kde=True)
You can use the hue
parameter to see the histograms of categories.
sns.histplot(data=data, x="flipper_length_mm", hue="species")
In this plot, you can see the histograms of the categories that show the penguin species.
3. Bar Plot
A bar plot represents an estimate of the central tendency for a numeric variable with the height of each rectangle. Let’s see the bar plot showing the flipper lengths of penguin species.
sns.barplot(x = "species", y = "flipper_length_mm", data = data)
By default, the bars are calculated based on the mean of the values. You can use another statistic instead of the mean using the estimator parameter. Let me use the hue
parameter to see the flipper lengths of the species by sex.
sns.barplot(x = "species",
y = "flipper_length_mm",
data = data,
hue = "sex")
4. Box Plot
The box plot is used to compare the distribution of numerical data between levels of a categorical variable. Let’s see the distribution of flipper length by species.
sns.boxplot(x = "species", y = "flipper_length_mm", data = data)
Here, the boxes show the quartiles of the data. The length of the whiskers represents the rest of the distribution. You can think of values outside of min-max as outliers. You can use the hue
parameter to see a boxplot of flipper lengths of species by sex.
sns.boxplot(x = "species",
y = "flipper_length_mm",
data = data,
hue = "sex")
5. Violin Plot
You can think of the violin plot as a box plot. This plot is used to compare the distribution of numerical values among categorical variables. Let’s see the violin plot of flipper length.
sns.violinplot(x = "species", y = "flipper_length_mm", data = data)
You can also use the hue
parameter to see the violin plot of the flipper lengths by sex.
sns.violinplot(x = "species",
y = "flipper_length_mm",
data = data,
hue = "sex")
Thus, the violin plot was drawn separately according to the sex variable. Isn’t it great? You can draw excellent plots with Seaborn. Let’s see how to plot multiple graphs in one graph.
6. Facet Grid
You can use a facet grid to see a grid graph of the different subsets in your dataset. For example, let me draw the histogram plot of the penguins’ flipper length according to the island and sex variables. Let’s assign column and row variables to add more subplots to the figure. First, I’m going to specify the variables that will be in the rows and columns.
sns.FacetGrid(data, col="island", row="sex")
When you run this command, 6 subareas occurred because the island variable has 3 categories and the sex variable has 2 categories (2*3 = 6). Let’s draw a plot on every facet using the map
method. For example, let’s see the histograms of flipper length.
sns.FacetGrid(data, col="island", row="sex").map(sns.histplot, "flipper_length_mm")
You can also draw a different plot on every facet. For example, let’s see the scatter plot of flipper length.
sns.FacetGrid(data, col="island", row="sex").map(sns.distplot, "flipper_length_mm")
Awesome! You can easily draw subplots with Seaborn.
7. Pair Plot
Seeing the pair relationship between the variables in the dataset is one of the important steps of data analysis. You can use the pairplot
method to see the pair relations of the variables. This function creates cross-plots of each numeric variable in the dataset. Let’s see the pairs of numerical variables according to penguin species in the dataset.
sns.pairplot(data, hue="species", height=3)
Since the variables are numerical, a probabilistic density function is automatically drawn on the diagonal axis of the graph. You can use the diag_kind
parameter to draw histograms on the diagonal axis.
sns.pairplot(data, hue="species", diag_kind="hist")
8. Heatmap
Finally, let’s look at the heatmap. Heatmap is one of a very useful visualization techniques. You can use this technique to see correlations between numerical variables. Let’s use the corr
method to see this.
sns.heatmap(data.corr())
You can see the relationship between the numerical variables in this graph. You can also use the annot
parameter to see the numeric values in each cell. Let me show you this.
sns.heatmap(data.corr(), annot=True)
So, numerical values were set in each cell.
Conclusion
Data visualization is one of the important steps in data science projects. It is very important to explore the data before analyzing data. In this blog post, I talked about data visualization with seaborn. Seaborn is one of Python’s most important libraries used for data science. Seaborn is mainly used for plotting statistical graphs. You can find the notebook and dataset here. Thank you for reading. I hope you enjoy it.
Don’t forget to follow us on YouTube | GitHub | Twitter | Kaggle | LinkedIn 👍
If this post was helpful, please click the clap 👏 button below a few times to show me your support 👇