Basic understanding of data visualization and matplotlib library

Saroj Humagain
ml.careers
Published in
3 min readSep 16, 2018
Photo by Carlos Muza on Unsplash

“Picture is worth a thousand words”, the plots and graphs can be very effective to convey a clear description of the data to an audience or sharing the data with other peer data scientists. Data visualization is a way of showing complex data in a graphical form to make it understandable. When you are trying to explore the data and getting familiar with it, data visualization is used. In any corporate industry, it can be very valuable to support any recommendations to clients, managers, or decision-makers.

Darkhorse Analytics is a company that runs a research lab at the University of Alberta since 2008. They have done really fascinating work on data visualization. Their approach to visualizing data depends on three key points: less is more effective, it is more attractive, and it is more impactive. In other words, any feature incorporated in the plot to make it attractive and pleasing must support the message that the plot is meant to get across not to distract from it.

Matplotlib

Matplotlib is one of the most popular data visualization library in python. It was created by a neurobiologist, John Hunter(1968–2012). Matplotlib’s architecture is composed of three layers:

Architecture of matplotlib
  1. Backend layer
    The back-end layer has three built-in abstract interface classes:
    A. FigureCanvas: matplotlib.backend_bases.FigureCanvas
    It defines and encompasses the area onto which the figure is drawn.
    B. Renderer : matplotlib.backend_bases.Renderer
    An instance of the renderer class knows how to draw on the FigureCanvas.
    C. Event: matplotlib.backend_bases.Event
    It handles user input such as keyboard strokes and mouse clicks.
  2. Artist layer
    It is composed of one main object, i.e Artist. The artist is the object that knows how to use the renderer to draw on the canvas. Everything we see in the Matplotlib figure is an artist instance. There are two types of artist object
    A. Primitive: Line2d, Rectangle, Circle, and Text.
    B. Composite: Axis, Tick, Axes, and Figure
    Each composite can contain other Composite artists as well as primitive artists. For example, a figure artist would contain as axis artist as well as a text artist or rectangle artist.
  3. Scripting layer
    It was developed for those scientists who are not professional programmers. The goal of this layer is to perform a quick exploratory analysis of data. It is essentially the Matplotlib.pyplot interface. It automates the process of defining a canvas and defining a figure artist and connecting them. Since it automatically defines canvas, artist and connects them. It makes data analysts easy to do things. So most of the data scientists prefer this scripting layer to visualize their data.

The above code plots a histogram of a hundred random numbers and saves the histogram as matplotlib_histogram.png.

The versatility of Matplotlib can be used to make many visualization types:-

  • Scatter Plots
The output of the above code
  • Bar charts and Histograms
Bargraph
Histogram
  • Line plots
Line plots
  • Pie charts
pie chart
  • Stem plots
stem plot

--

--

Saroj Humagain
ml.careers

I basically write on data science, ML and AI and sometimes random things.