Matplotlib Tutorial — 7

Vivekawasthi
CodeX
Published in
4 min readNov 30, 2022

This tutorial will cover Scatter plots with Matplotlib.

A scatter plot uses dots to represent values for two different numeric variables. The position of each dot on the horizontal and vertical axis indicates values for an individual data point. Scatter plots are used to observe relationships between variables.

Let,s check one basic example of Scatter plots then, we will check all the customizations.

import pandas as pd
from matplotlib import pyplot as plt

plt.style.use('seaborn')

x = [5, 7, 8, 5, 6, 7, 9, 2, 3, 4, 4, 4, 2, 6, 3, 6, 8, 6, 4, 1]
y = [7, 4, 3, 9, 1, 3, 2, 5, 2, 4, 8, 7, 1, 6, 4, 9, 7, 7, 5, 1]

plt.scatter(x,y)
plt.tight_layout()

plt.show()

Here, we have used random x and y values and created a scatter plot using the scatter() method.

In the scatter plot, each value represents in dots since we have random data, it's not showing any relation, but we will check real-world data as well in a later example.

Now, let's play with some customization options present for scatter plots, I will provide link as well for customization documentation.

plt.scatter(x,y,s=100,c= 'Green',marker = 'X')

Here, we are setting the size of the dots using ‘s’, Color using ‘c’, and marker using ‘X’, let’s run the whole code and check the graph.
Now, instead of setting green for each marker we can set a different color for each dot, let’s check the example for the same.

colors = [7, 5, 9, 7, 5, 7, 2, 5, 3, 7, 1, 2, 8, 1, 9, 2, 5, 6, 7, 5]

plt.scatter(x,y,s=100,c=colors , cmap = 'Greens')

Here, we have a list of colors for x and y values, and we have passed with argument ‘c’ and used cmap argument to provide some good color to each value.

We can also use edgecolor and linewidth also for better representation.

colors = [7, 5, 9, 7, 5, 7, 2, 5, 3, 7, 1, 2, 8, 1, 9, 2, 5, 6, 7, 5]

plt.scatter(x,y,s=100,c=colors , cmap = 'Greens',
edgecolors='black',linewidths=1)

Now, we can also set different sizes as well for each marker, for that we have created a list of sizes and passed the same in ‘s’ argument.

sizes = [209, 486, 381, 255, 191, 315, 185, 228, 174,
538, 239, 394, 399, 153, 273, 293, 436, 501, 397, 539]

plt.scatter(x,y,s=sizes,c=colors , cmap = 'Greens',
edgecolors='black',linewidths=1)

Now, we can add a label also, so when we present this graph it's easy to understand, suppose all the values represent customer satisfaction, let’s add the label for the same.

plt.scatter(x,y,s=sizes,c=colors , cmap = 'Greens',
edgecolors='black',linewidths=1)
cbar = plt.colorbar()
cbar.set_label('customer satisfaction')

Now, we have covered customization in sample data, please find the link below for the documentation.

Now, check examples with real-world data for better understanding. I have a excel that have data for youtube trading video , let’s plot the same.

import pandas as pd
from matplotlib import pyplot as plt

plt.style.use('seaborn')

data = pd.read_csv('youtube_data.csv')
view_count = data['view_count']
likes = data['likes']
ratio = data['ratio']
plt.scatter(view_count,likes,c=ratio,cmap='summer',
edgecolors='black',linewidths=1)
plt.xscale('log')
plt.yscale('log')
cbar = plt.colorbar()
cbar.set_label("Youtube like/dislike ratio")
plt.title('Trending YouTube Videos')
plt.xlabel('View Count')
plt.ylabel('Total Likes')

plt.tight_layout()

plt.show()

Here , I am reading data from csv , view_count and likes , I am using for x and y axis. Ratio, I am using for color.

Also I have set cmap as summer and using log scale for both x and y axis.

That’s showing us a good representation of the real time data, Please find git hub link for code and data excel.

In the next tutorial, will cover, plotting the time series data.

--

--