What is Scatter Plot ? and How to Create a Scatter Plot with Python?

Ömer Faruk ÇELİK
2 min readJul 14, 2020

Data visualization comprise a serious part of a data scientist’s workflow. They make presentations to the people who don’t understand by just looking at data. So the plots should make it easy to analyze and perceive them with beautiful plots and charts.

A great way to see the relationship between two variables is using Scatter Plot. You can see an example of using of Scatter plot on the above picture.

As you can see in the above picture, it is possible to adjust the point sizes and shapes, you can group points by colors, give custom title, x-axis, and y-axis names.

We can say that a scatter plot is a diagram that each value in the data set is represented by a dot.

Okey, Let’s create a scatter plot

How to Create a Scatter Plot?

In order to create a scatter plot, we should use matplotlib.pyplot library of Python. And in order to add a data set file to the code, we need to use Pandas library. Let's import libraries to our code.

Here I shortened matplotlib library as “plt” and Pandas as “pd” to use easily. From now on, every time I use “plt” or “pd” it means python library.

For this example, I want to use “area_mean” and “mean size of the core tumor” to determine the x-axis and y-axis. I want to find the correlation between them. The relationship between two variables is called correlation.

I am going to use the Breast Cancer data set that I found on Kaggle.com. In order to use the csv data set, I have to use read_csv(),

I am going to add title, x-axis label, and y-axis label to the plot.

Finally, I will use scatter() method and add two parameters, for the x-axis and y-axis. For x-axis “area_mean”, for y-axis “mean size of the core tumor”. To display the plot, you should use show() method.

Here is the scatter plot that I just created;

Final Plot

In this plot, you can analyze the plot and observe the correlation between these two variables.

Author: Ömer Faruk Çelik

--

--