Most Useful Data Visualization techniques in Machine Learning
In this article We are use different data visualization techniques used in various domains of machine learning. Data Visualization is a Basics step for building a powerful and efficient machine learning model. It helps us to better understand the data, generate better insights for feature engineering, and, finally, make better decisions during modeling and training of the model.
we will use the seaborn and matplotlib libraries to generate the visualizations. we will explore different statistical graphical techniques that can help us in effectively interpreting and understanding the data. Although all the plots using the seaborn library can be built using the matplotlib library, we usually prefer the seaborn library because of its ability to handle DataFrames.
A histogram is used to summarize discrete or continuous data. It provides a visual interpretation of numerical data by showing the number of data points that fall within a specified range of values (called “bins”). It is similar to a vertical bar graph. However, a histogram, unlike a vertical bar graph, shows no gaps between the bars.
When we are Creating a histogram its provides a visual representation of data distribution. Histograms can display a large amount of data and the frequency of the data values. The median and distribution of the data can be determined by a histogram. In addition, it can show any outliers or gaps in the data.
Scatter plots are used to understand the relationships between two numerical variables. Each member of the dataset gets plotted as a point whose (x, y)(x,y)left parenthesis, x, comma, y, right parenthesis coordinates relates to its values for the two variables.
For example, we try to understand whether scatterplot that shows the shoe sizes and quiz scores for students in a class:
( Each data point is a student whose x-coordinate gives their shoe size and y-coordinate gives their quiz score. )
Bar Chart :
Bar charts are great when we want to track the development of one or two variables over time.
For example, one of the most frequent applications of bar charts in corporate presentations is to show how a company’s total revenues have developed during a given period.
A bar chart can be used to make both a year-on-year comparison and a monthly breakdown. Bar charts can be pretty intuitive when we compare the development of two numerical variables over time. Let’s say we would like to compare the revenues of two companies in the timeframe between 2014 and 2018.
Box Plot :
Box plot gives statistical information about the distribution of numeric data divided into different groups. It is useful for detecting outliers within each group.
A box plot (also called a whisker plot) is useful to visualize the distribution of data and find outliers. This plot displays the five-number summary: minimum, first quartile, median, third quartile and maximum.
We can also visualize distribution between two continuous variables or one categorical and one continuous variable using scatter plot or multiple box plots by different categories of categorical variables respectively.
Line Plot :
A line plot is useful for visualizing the trend in a numerical value over a continuous time interval.
( In this figure We can see an increasing trend in the number of food orders with the number of weeks and months, though the trend is not very strong. )