Make Your Data Look Good 🌈

Güldeniz Bektaş
The Startup
Published in
6 min readJan 5, 2021

Often, the data we have is not enough to make sense with a single glance. By saying enough, I didn’t mean less. We can have millions of features with millions of rows, and maybe I’m exaggerating but the data we have sometimes can be hard to understand in a single table. Data visualization allows us to see what the data wants to say in a graphic form and understand what is going on in general. We can spot trends, outliers, and handle or interpret them for our best.

Photo by Luke Chesser on Unsplash

Looking at those colorful plots makes it all easier to get a grip on.

No matter what kind of data you are dealing with, visualizing it has definitely worked for you. The best way to explain the relationships between features to your customers, and how they affect their requests is through visualization. Even people with no technical background should understand when looking at a chart that it’s simple and clear. You can give the message you want without bothering anyone with technical words.

What Can You Show With Visualization?

🎇 Changes over time. Most of the data involves time, and most companies want to plan how their work has changed over time and what they want to do for the future by looking at those trends.

🎇 Determining correlations. It is very important for the reliability of our model to know the relationships between features, especially with our target value, and how they affect each other. We can understand this best with the aid of visualization.

🎇 Analyzing risk. Identifying valuable and risky data is difficult to do by looking at the table. With certain visualizations and colors, we can understand which values are risky and which are valuable.

🎇 Frequency. We have just stated that most of the data involve time. Companies want to know how often the relevant events happen over time. Visualization helps to understand in this case too.

🎇 Analyzing the market. With the data you receive from other markets, you can better understand what you need to provide to the customer or which customers you should continue, with visualizations.

I can’t say I’m good at this. But with this article I hope, I can learn, and give you some insights about this important task. Let’s start then!

This is the Github link to these codes.

We’ll use a classic dataset. First we need to fetch our data through Seaborn library, and import important libraries.

import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('seaborn-darkgrid')
iris = sns.load_dataset('iris')iris.head()

After executing this cell you need to see an output like this:

Using magic function ‘%matplotlib inline’ serves the purpose of including your plots to your notebook.

With ‘plt.style.use(‘seaborn-darkgrid’), we set the style for our plots. You can see other options with ‘plt.style.available’.

💥 Pairplot

We’ll start with the easiest one. ‘sns.pairplot’ draws the relationships between all the columns in the data set. With ‘hue’ in it, we state that it should separate the data by colors according to the categories we have determined.

sns.pairplot(iris, hue='species', height=3, diag_kind = 'hist');

See? How amazing it looks! Colors will change with your plot style.

💥 Plot

Matplotlib has a very useful, and easy method to plot. You can specifiy x, and y axes within this method. With ‘marker’ attribute you can change the way your dots looks.

  • See more for marker style with this link.
iris.plot(kind = 'scatter', x = 'sepal_width', y = 'sepal_length', color = 'red', marker = '1')
iris.plot(kind = 'scatter', x = 'petal_width', y = 'petal_length', color = 'cyan', marker = '>');

💥 FacetGrid

I met with .FacetGrid method recently but I loved it. It’s really useful, and fun to make. With this method, we can determine certain conditions and draw relationships accordingly. We need to map plots.

sns.FacetGrid(iris, hue="species", palette="Set2", height=4.5) \
.map(sns.scatterplot, "sepal_length", "sepal_width") \
.add_legend();
sns.FacetGrid(iris, hue="species", palette="husl", height=4.5) \
.map(sns.scatterplot, "petal_length", "petal_width") \
.add_legend();

Let’s interpret this:

  • Setosa species has low petal_length, and petal_width but has more sepal_width.
  • Virginica species has high petal length, and width.

💥 Subplot

Sometimes we may want to draw more than one plot. Instead of doing this in individual cells, we can visually see multiple relationships in a single row with the .subplots method.

  • First, we need to indicate, how many rows, and columns we want. We’ll draw 4 plot, so we need 2 rows, and 2 columns.
  • fig is our whole figure, ax is the every plot in this figure.
  • We can draw different kind of plots in this figure. They don’t have to be the same.
  • ax attribute in them specify which plot we want it to draw.
  • We can change the title of every plot with .set_xlabel method.
fig, ax = plt.subplots(2, 2, figsize = (18, 10))
fig.suptitle('Vertically Plots', fontsize = 30)
sns.histplot(ax = ax[0, 0], data=iris, x="petal_width", hue="species",
multiple="stack", palette = 'nipy_spectral')
ax[0,0].set_xlabel('Petal Width', fontsize = 18)
sns.histplot(ax = ax[0, 1], data=iris, x="petal_length", hue="species",
multiple="stack", palette = 'icefire')
ax[0,1].set_xlabel('Petal Length', fontsize = 18)
sns.histplot(ax = ax[1, 0], data=iris, x="sepal_width", hue="species",
multiple="stack", palette = 'gist_stern_r')
ax[1,0].set_xlabel('Sepal Width', fontsize = 18)
sns.histplot(ax = ax[1, 1], data=iris, x="sepal_length", hue="species",
multiple="stack", palette = 'CMRmap_r')
ax[1,1].set_xlabel('Sepal Length', fontsize = 18);

💥 Boxplot

Boxplot is the plot that allows us to see the distribution among the data clearly. With this plot, we can see and interpret outlier values.

💥 Jointplot

Draw a plot of two variables with bivariate and univariate graphs.

plt.figure(figsize = (12, 8))sns.boxplot(data = iris, x = 'sepal_length', y = 'sepal_width',
hue = 'species', palette = 'rainbow')
sns.jointplot(data = iris, x = 'sepal_length',
y = 'sepal_width', hue = 'species');

💥 Heatmap

It allows you to understand the relationship between features with colors.

iris2 = iris.drop('species', axis = 1)
sns.heatmap(iris2, cmap = 'YlGnBu');

REFERENCES

https://analytiks.co/importance-of-data-visualization/#:~:text=Data%20visualization%20gives%20us%20a,outliers%20within%20large%20data%20sets.

--

--