All about Scatter Plots!

Urjadd
3 min readNov 13, 2022

--

What is a Scatter plot?

A scatter plot is a type of chart where data points are represented by dots concerning the horizontal and vertical axis. Thus, the graph uses the Cartesian coordinate system to plot its data points on the chart. This graph is usually used to display and observe the relationships between the variables, to check if there is any correlation between variables. The scatter plot is also known as a scatter chart, scatter graph, or scattergram.

When to Use Scatter Charts

There are many uses for a scatter plot, some of them is to see the correlation and find out trends or relationships between the variables. It is also useful to check the dependencies and how changes in one variable affect the other.

Exploring the Scatter plot attributes

The scatter plot has many attributes to make it more understandable as well as more appealing visually. Let us explore each attribute one by one and the changes it makes on the graph.

matplotlib.pyplot.scatter(x, y, s=None, c=None, marker=None, cmap=None, norm=None, vmin=None, vmax=None, alpha=None, linewidths=None, *, edgecolors=None, plotnonfinite=False, data=None, **kwargs)[source]

1. x, y (axis)

values assigned to the x-axis and y-axis

import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
iris = sns.load_dataset("iris")
plt.scatter(iris.index,iris['sepal_width'])

2. s (size)

This attribute is used to determine the size of data points

plt.scatter(iris.index,iris['sepal_width'],s=75)

3. c (color)

It sets the data points to the color it has been assigned. It takes a string, hex values, and RGB (2D array).

plt.scatter(iris.index,iris['sepal_width'], s = 75,c  = "red")

4. marker

markers are used to change the shape of the data points. You can check out many options such as.,o,v,^,<,>,1,2,3,4,8,s, and many more.

plt.scatter(iris.index,iris['sepal_width'],s=75,c ="purple",marker = '*')

5. cmap

The Colormap instance or registered colormap name uses pre-defined themes to map the scalar data to colors.

themes example: ‘Accent’, ‘bw_r’, ‘Blues’, ‘Blues_r’, ‘BrBG’, ‘binary’, ‘spring_r’, ‘ocean, ‘BuPu’, ‘BuPu_r’, ‘CMRmap’, ‘autumn’, ‘Dark2’, ‘Dark2_r’, ‘GnBu’, ‘GnBu_r’, ‘Greens’

t = iris.index
plt.scatter(iris.index,iris['sepal_width'],c = t,cmap = 'turbo')

6. Norm

The normalization method scales the data ranging from 0 to 1 inclusive before mapping to colors. there are different types of scale names such as log, symlog, and logit but by default linear method is used to scale the data.

cmap = matplotlib.cm.viridis_r
norm = matplotlib.colors.BoundaryNorm([30,60,90,120], cmap.N,clip = True)
plt.scatter(iris.index,iris['sepal_width'],c=t,norm=norm)

7. Alpha

The alpha is the transparency value ranging from 0 to 1.

0 — no color

1- opaque

plt.scatter(iris.index,iris['sepal_width'],s=200,c = t,cmap = 'turbo',alpha= 0.4)

8. linewidth, edgecolors

linewidth is the thickness of the border and edgecolors give the color to the border according to the input.

plt.scatter(iris.index,iris['sepal_width'],c = t,cmap = 'spring',s=100,edgecolors='black',linewidth=3)

Pros and Cons of Scatter plot

Pros:

1. easy to understand.

2. range of the data (min, max).

3. easy detection of outliers.

Cons:

  1. Cannot handle large datasets.

2. Limits to 2 variables.

--

--

Urjadd

Welcome to my all about series where you get to learn everything right from scratch.