Basic understanding of data visualization and matplotlib library
“Picture is worth a thousand words”, the plots and graphs can be very effective to convey a clear description of the data to an audience or sharing the data with other peer data scientists. Data visualization is a way of showing complex data in a graphical form to make it understandable. When you are trying to explore the data and getting familiar with it, data visualization is used. In any corporate industry, it can be very valuable to support any recommendations to clients, managers, or decision-makers.
Darkhorse Analytics is a company that runs a research lab at the University of Alberta since 2008. They have done really fascinating work on data visualization. Their approach to visualizing data depends on three key points: less is more effective, it is more attractive, and it is more impactive. In other words, any feature incorporated in the plot to make it attractive and pleasing must support the message that the plot is meant to get across not to distract from it.
Matplotlib
Matplotlib is one of the most popular data visualization library in python. It was created by a neurobiologist, John Hunter(1968–2012). Matplotlib’s architecture is composed of three layers:
- Backend layer
The back-end layer has three built-in abstract interface classes:
A. FigureCanvas: matplotlib.backend_bases.FigureCanvas
It defines and encompasses the area onto which the figure is drawn.
B. Renderer : matplotlib.backend_bases.Renderer
An instance of the renderer class knows how to draw on the FigureCanvas.
C. Event: matplotlib.backend_bases.Event
It handles user input such as keyboard strokes and mouse clicks. - Artist layer
It is composed of one main object, i.e Artist. The artist is the object that knows how to use the renderer to draw on the canvas. Everything we see in the Matplotlib figure is an artist instance. There are two types of artist object
A. Primitive: Line2d, Rectangle, Circle, and Text.
B. Composite: Axis, Tick, Axes, and Figure
Each composite can contain other Composite artists as well as primitive artists. For example, a figure artist would contain as axis artist as well as a text artist or rectangle artist. - Scripting layer
It was developed for those scientists who are not professional programmers. The goal of this layer is to perform a quick exploratory analysis of data. It is essentially the Matplotlib.pyplot interface. It automates the process of defining a canvas and defining a figure artist and connecting them. Since it automatically defines canvas, artist and connects them. It makes data analysts easy to do things. So most of the data scientists prefer this scripting layer to visualize their data.
The above code plots a histogram of a hundred random numbers and saves the histogram as matplotlib_histogram.png.
The versatility of Matplotlib can be used to make many visualization types:-
- Scatter Plots
- Bar charts and Histograms
- Line plots
- Pie charts
- Stem plots