Become a Pro in Scatter Plot Visualization

A Matplotlib Guide for Creating Scatter Plot

Amsavalli Mylasalam
Variablz Academy
4 min readSep 28, 2022

--

Become a Pro in Scatter Plot Visualization (Credits: Aatomz)

The primary purpose of a scatter plot is to determine the relationship between two variables (bivariate). Scatterplots play a very important role in regression algorithms to find the relationship. Also, In classification problems, scatter plot help to identify the correlation between the features.

Here I am gonna explain how to create such a basic scatterplot, and also I am gonna explore how to customize it with labels, markers, and colors using matplotlib

You can download the diamond data set from this link for reproducing this code.

Creating Scatter Plot with Diamond dataset

Import the necessary libraries

Load data file into Pandas Data Frame

Create a scatter plot.

Let's see if carat and price are correlated using a scatter plot (sample for 620 rows) in Diamond Dataset.

With the help of a scatter plot, we can find Bivariate Analysis. ie, here, the relationship between the carat and the price of the diamond can visualize through a scatter plot.

Scatter Plot

Wow! we’ve created a scatter plot to find the relation between the carat and the price of the diamond. Yet! It feels something missing.

Yeah! without the axis label, it may be challenging to understand the relationship. So let’s see how to add those axis labels.

Setting Label

Here we are setting the X and Y label title using set_xlabel and set_ylabel methods, respectively. The fontdict parameter helps to customize the label formats.

CODE: Setting Label in Scatter Plot
Scatter Plot Between Carat and Price of Diamonds

Now it looks cool; however, the first thing our client asks is what this plot is for. So if you don’t want to answer for it, just put the title for the plot. Indeed it’s a mandatory step 😅

Setting Title

using the set_title method, we can set the title for the plot. here also, we can customize the font of the title with the fontdict parameter.

Code: Setting Title

So we have checked the primary representation of the scatter plot so far. Let me explain some advanced options for scatter plots.

Setting Marker: size, color, shape, and edge color

We can also customize the marker in the scatter plot. Here I am gonna explain how we can resize and change the shape and color of the markers in the scatter plot.

Use the s parameter to change the size of the marker.

Use the c parameter to change the color of the marker.

Use the marker parameter to change the type of marker. For more marker style, check out this link.

Use edgecolor parameter to change the color of the edge of the marker.

Code: Marker Type, Size , and Colour

Adding Annotations

To add annotations in the scatter plot, we have to do the following steps.

⚽️ Store all the annotations in a list in order with the sequence of the points to be displayed.
⚽️Draw the scatter plot.
⚽️ Using a for loop, annotate each point.

consider the following example: For this, I filtered the premium diamonds from a diamonds data frame

step1: In the premium diamond dataset, I chose the clarity feature for annotations and stored this as a list.
step2: Drawing scatter plot: In this xaxis=carat, yaxis=price.
step3: Created for loop to annotate clarity feature.

Input:

Output:

So, I hope next time you plot your scatter plot, you will search for my article. If you like this, don’t forget to clap.

If you want to learn more options, let me know in the comments, and I will update how to do those features in this article.

Follow me on LinkedIn for more insights on Data Visualization

https://www.linkedin.com/in/amsavalli-datascientist/

Thanks & Regards

Amsavalli

--

--