Extracting Beauty from the World of Data Science

Theodore Brandon
3 min readSep 26, 2021

A Color-Loving Data Scientist’s Journey through Seaborn (and Matplotlib!)

It is every data scientist’s job to process data. Large amounts of data. Small amounts of data. Clean data, dirty data, complete data, incomplete data, accurate data, inaccurate data, complex, simple, unspecified, unknown….. lots and lots of data. At the end of every job, however, what remains, whatever discoveries have been made, any conclusions presented, rejected Null Hypotheses, failures to reject Null Hypotheses — all of this needs to be digested, summarized, and presented in a palatible form to the powers that be. Many times, a simple Pandas summary data table will suffice. However, as the saying goes, a picture’s worth 1,000 words.

Elegant, pleasing to the eye, and it tells a clear story: over the 1950’s, airline traffic increased, with large peaks in the summer months. Useful data for the justification of planning future growth and spending.

Seaborn: Born in 2012, from the amazing brain of Dr. Michael Waskom, PhD, Seaborn is built atop the already powerful Matplotlib, and engineered to integrate with Pandas. Thus, this is a powerful data visualization library, with which every data scientist need be intimately familiar. What Matplotlib brings to the table serves as an excellent base for Seaborn to shine.

Color and Customizability:

A small taste of built-in colors, shades, gradients and opacities
Comprehensive list of color palettes available with Seaborn

With an extensive list of built-in palettes (hex-code can be utilized as well), customizing plots is a few keystrokes away.

Seemingly Endless Options: In addition to the examples provided above, the flexibility this package offers is, to put it lightly, exquisite. Take your dataframe and a few lines of code et voilà:

Left: Scatterplot Matrix, displaying atributes of 3 species of penquin. Right: Spreadsheet Heatmap of the same airline data presented previously.

Templates for creating these “works of art” are a quick google-search away:

Example 3D-images displaying both the flexibility and the beauty Seaborn is capable of producing.

By tweaking a few parameters (a linewidth here, a histogram bin# there), amateurs such as myself, can gain more understanding of how to use this library. Plus, let’s just be honest; it looks cool too!

The take home message: If you have big data, you’ve extracted useful, impactful information from it, but you need a good way to effectively communicate this information; Seaborn is your best bet.

Aside from the 3-D images, all plots originated from the mwaskom/seaborn public github repository. This author has added a certain level of customization to each, in an attempt to improve skills and understanding of this delightful package.

Recommended for all beginners: A Seaborn tutorial can be found here.

--

--

Theodore Brandon

Budding Data Scientist - Ready to conquer the world's data, one set at a time!