Data Visualization and Matplotlib

Dilara Şahan
Analytics Vidhya

--

Data, known as today’s mine, is increasing in popularity. This increase makes sub-branches such as data analysis, data cleaning and data visualization increasingly common. The recent popularity of data science has likewise affected the frequency of use of programming languages ​​and libraries. The most well-known of these libraries are Python libraries such as Pandas, NumPy, Scikit-learn and Matplotlib.

The data scientist, who needs to ask the right questions to analyze a data, should also analyze and report the results in the best way. One of the most important element of this analysis is data visualization. Visualization allows us to better understand the data, see the details we missed, and embody our analysis.

In Pyhton there are many libraries for visualization. The most well-known of these are Matplotlib, Seaborn and Plotly. Most of the libraries in Python have their own peculiarities. Now let’s talk about the features and uses of Matplotlib.

Matplotlib

There are many libraries for data visualization in Python. One of these, perhaps the simplest implementation phase and the most high in function is the Matplotlib library. Matplotlib is a 2D, 3D drawing library that helps us analysts visualize figures. Matplotlib contains popular charts such as barplot, scatter plot, pie chart and histogram.

Let’s create these graphics to understand and apply the matplotlib library. Before going through the examples, you can access the sample dataset that we will use in this article on the page of Kaggle, which has datasets for machine learning. In the examples in the application part, we will analyze the “ Iris” dataset.

First, import Matplotlib and Pandas .

import matplotlib.pyplot as plt
import pandas as pd

Then loaded the dataset into the variable , in this example“df”.

df = pd.read_csv('Iris.csv')

Barplot:

Barplot is a graph for making linear and one-dimensional comparisons with categorical data.

In this example I wanted to look at the variation of sepal length cm by species.

plt.bar(df['Species'], df['SepalLengthCm']
,color = 'lightcoral'
, width = 0.3)
plt.show()

Output:

Scatter Plot:

We use scatter plot to show correlation and clustering in big datasets. The scatter plot shows the relationship between two numeric parameters with dots.

Let’s examine the relationship between the width and length of the petals of the iris flower with a scatter plot.

plt.scatter(df['PetalLengthCm'],df['PetalWidthCm'], color = 'orchid')plt.xlabel('Petal Length')plt.ylabel('Petal Width ')plt.title('Petal Length/Width Relation')plt.show()

Output:

Pie Chart:

Pie Chart, has a round structure reminiscent of a pizza. And according to the frequency of the data, it will have pizza slices of different sizes.

In this section, after giving the Species column as the first parameter, we gave the unique values ​​of this column as the label.

plt.pie(df['Species'].value_counts()
,labels = df['Species'].unique()
,colors= ['darkolivegreen','yellowgreen','greenyellow'])
plt.show()

Output:

Histogram:

Histograms are columnar representations of the data distribution. It generally helps us to find the frequency of an element in the dataset, to discover the distribution of the dataset.

In this section I visualized the distribution of iris flower species in the dataset as a histogram.

plt.hist(df['Species'], label = 'Species', color = 'tomato')plt.show()

Output:

In this article, I mentioned about visualization, Matplotlib library and some graphics on a small scale. If you want to improve yourself more about visualization, you can review the Matplotlib documentation.

--

--