What is a Scatter plot?
A scatter plot is a type of chart where data points are represented by dots concerning the horizontal and vertical axis. Thus, the graph uses the Cartesian coordinate system to plot its data points on the chart. This graph is usually used to display and observe the relationships between the variables, to check if there is any correlation between variables. The scatter plot is also known as a scatter chart, scatter graph, or scattergram.
When to Use Scatter Charts
There are many uses for a scatter plot, some of them is to see the correlation and find out trends or relationships between the variables. It is also useful to check the dependencies and how changes in one variable affect the other.
Exploring the Scatter plot attributes
The scatter plot has many attributes to make it more understandable as well as more appealing visually. Let us explore each attribute one by one and the changes it makes on the graph.
matplotlib.pyplot.scatter(x, y, s=None, c=None, marker=None, cmap=None, norm=None, vmin=None, vmax=None, alpha=None, linewidths=None, *, edgecolors=None, plotnonfinite=False, data=None, **kwargs)[source]
1. x, y (axis)
values assigned to the x-axis and y-axis
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
iris = sns.load_dataset("iris")
plt.scatter(iris.index,iris['sepal_width'])
2. s (size)
This attribute is used to determine the size of data points
plt.scatter(iris.index,iris['sepal_width'],s=75)
3. c (color)
It sets the data points to the color it has been assigned. It takes a string, hex values, and RGB (2D array).
plt.scatter(iris.index,iris['sepal_width'], s = 75,c = "red")
4. marker
markers are used to change the shape of the data points. You can check out many options such as.,o,v,^,<,>,1,2,3,4,8,s, and many more.
plt.scatter(iris.index,iris['sepal_width'],s=75,c ="purple",marker = '*')
5. cmap
The Colormap instance or registered colormap name uses pre-defined themes to map the scalar data to colors.
themes example: ‘Accent’, ‘bw_r’, ‘Blues’, ‘Blues_r’, ‘BrBG’, ‘binary’, ‘spring_r’, ‘ocean, ‘BuPu’, ‘BuPu_r’, ‘CMRmap’, ‘autumn’, ‘Dark2’, ‘Dark2_r’, ‘GnBu’, ‘GnBu_r’, ‘Greens’
t = iris.index
plt.scatter(iris.index,iris['sepal_width'],c = t,cmap = 'turbo')
6. Norm
The normalization method scales the data ranging from 0 to 1 inclusive before mapping to colors. there are different types of scale names such as log, symlog, and logit but by default linear method is used to scale the data.
cmap = matplotlib.cm.viridis_r
norm = matplotlib.colors.BoundaryNorm([30,60,90,120], cmap.N,clip = True)
plt.scatter(iris.index,iris['sepal_width'],c=t,norm=norm)
7. Alpha
The alpha is the transparency value ranging from 0 to 1.
0 — no color
1- opaque
plt.scatter(iris.index,iris['sepal_width'],s=200,c = t,cmap = 'turbo',alpha= 0.4)
8. linewidth, edgecolors
linewidth is the thickness of the border and edgecolors give the color to the border according to the input.
plt.scatter(iris.index,iris['sepal_width'],c = t,cmap = 'spring',s=100,edgecolors='black',linewidth=3)
Pros and Cons of Scatter plot
Pros:
1. easy to understand.
2. range of the data (min, max).
3. easy detection of outliers.
Cons:
- Cannot handle large datasets.
2. Limits to 2 variables.