Basic Data Visualization in Seaborn

Joel Sherman
5 min readSep 11, 2020

--

The famous prefix sns in Seaborn stands for Samuel Norman Seaborn, yes, this one.

There is no doubt that data visualization is a vital component in the data professional’s workflow, especially when communicating final insights or data science models to the client. But visualization is also used iteratively in the early stages of a project during exploration, transformation and cleaning. When combined with dataframe manipulation and merging, data visualization can become the backbone of a healthy exploratory data analysis (EDA) process. In a previous article I wrote about some of the basics in visualizing data with Matplotlib. In this article, I will do the same with Seaborn. In particular, I will cover the following:

1. Basic Plotting Functions in Seaborn
2. Adding more than Two Dimensions to a Seaborn Plot
3. Basic Plot Customization in Seaborn

As with my previous blog posts, I’ll be using personal data on my sleep, bicycle training and recovery which is available on my public GitHub account, here.

1. Basic Plot Functions

Seaborn has two primary facet grid plot object functions, the use of which will depend on the nature of your data.

Relational Plots

The first function is .relplot(), which will call relational plots, such as scatter plots or line plots. Relational plots show the relationship between two (or more) quantitative variables. When one of those variables is a datetime object, Seaborn’s relplot can produce a line chart. For example, to look at how my average heart rate variability (AvgHRV) varies over time (date), I can write the following line of simple code.

sns.relplot(x = 'date', y = 'AvgHRV', data = df, kind = 'line')

For other types of quantitative variables, .relplot() can produce a scatter plot to visualize the relationship. In my example, I’d like to look at the relationship between my AvgHRV and the amount of deep, slow-wave sleep that I get at night (Deep). Again, Seaborn can handle this with the following one-liner.

sns.relplot(x = 'Deep', y = 'AvgHRV', data = df, kind = 'scatter')

Categorical Plots

The second plotting function in Seaborn is .catplot(), which can make categorical plots such as bar plots, box plots, count plots and point plots. These types of plots show the distribution of a quantitative variable within categories defined by a categorical variable. For example, to look at how my AvgHRV varies based on whether I drank 1 (a little), 2 (too much) or 0 (nothing) beers the night before (Beers), I can write the following line of code to produce a boxplot.

sns.catplot(x = 'Beers', y = 'AvgHRV', data = df, kind = 'box')

2. Adding Multiple Dimensions to Seaborn Plots

Plotting two variables seems easy enough, but what about more? Here again, Seaborn doesn’t disappoint, and can plot more than 2 dimensions using a variety of different parameters. For example, the hue and size parameters can be used to alter the scatter plot above to add a third dimension (Beers) to see if the relationship between deep sleep and HRV changes with beer consumption. Here’s the code.

sns.relplot(x = 'Deep', y = 'AvgHRV', data = df, kind = 'scatter' hue = 'Beers', size = 'Beers')

Other arguments like col and/or row can create subplots or small multiples along a third or fourth dimension. Seaborn makes this easy, and the result is a powerful set of tools for exploring and visualizing your data.

Basic Plot Customization

Customizing plots in Seaborn is very simple as well. In this final section I’ll cover customizations of plot backgrounds, element color palettes, the scale of Seaborn plots, and how to add titles.

Background

To change the background style of a Seaborn plot, use the set_style() method, along with a parameter for the style. Seaborn has five (5) styles for you to choose from: 'white' (default, pictured in Figures 1 to 4 above), 'dark' (makes grey background), 'whitegrid' (same as 'white' but with grey gridlines), 'darkgrid' (same as 'dark' but with white gridlines) and 'ticks' (same as 'white' but adds tick marks to the x and y axis).

Color Palettes

To change the color palette of the main plot elements, use Seaborn’s set_palette() method, and pass in a palette parameter. There are many types of palette parameters in Seaborn. Diverging palettes like 'RdBu' or 'PRGn' will color the elements on diverging scales of red to blue, or purple to green, respectively. Diverging palettes are typically used when comparing elements, as the values diverge from a central value, like zero, or the mean of a variable. In contrast, sequential palette parameters like 'greys' or 'blues' will color the elements varying shades of grey or blue, respectively, and are great when working with variables that have continuous scales. And if you want to use a custom palette, Seaborn allows you to pass your own list of colors or even hex codes, like ['color/hex 1', 'color/hex 2'...].

Scale

By default, Seaborn assumes that you want to visualize your data at the scale of a typical paper publication. But there are some instances when you need your plots to be larger! Seaborn’s set_context() method allows you to do just that, by passing in a context parameter. In addition to the default 'paper' context parameter, you can use 'notebook' (slightly larger than paper, great for Jupyter notebooks!), 'talk' (perfect for presentations), or 'poster' (the largest scale of all!).

Titles

Finally, no plot is complete without an appropriate title. To add titles to facet grid objects like replots and catplots in Seaborn, we first assign the plot to a variable, say g g = sns.relplot(...) and then call g.fig.suptitle('Your Plot Title'). For small multiples that invoke the col or row parameters, you can also title your subplots using g.set_titles('Subtitle of Group {row/col name}') and Seaborn will set the titles of the subplots in your col or row dimension accordingly.

I hope you enjoyed reading this blog as much as I enjoyed writing it, and get something out of it as a Seaborn beginner. Now go forth and code!

--

--

Joel Sherman

I’m an experienced data professional at the intersection of public policy and economics, trying to make sense of the world, one dataset at a time.