Data Visualization Part I | from Data Science Handbook

borandabak
7 min readJul 5, 2022

--

Hello everyone Today, I will talk about the data visualization part of the data science handbook, which is one of the most important books in the world of data science.

If you are interested in data, I think your chances of not knowing this book are very slim.It is a very valuable book for everyone, both because it has a lot of information, it is shown with many examples and it is explained very fluently.

I’m going to tell you a part of Chapter 4 data visualization today with you.

Let’s get started.

First we import the libraries.

If you are using Matplotlib from within a script, the function plt.show() is your friend.

One nice feature of Matplotlib is the ability to save figures in a wide variety of for‐ mats. You can save a figure using the savefig() command.

You can find the list of supported file types for your system by using the following method of the figure canvas object

Simple Line Plots

Perhaps the simplest of all plots is the visualization of a single function y = f x . For all Matplotlib plots, we start by creating a figure and an axes. In their simplest form, a figure and axes can be created as follows.

Once we have created an axes, we can use the ax.plot function to plot some data. Let’s start with a simple sinusoid

ax.plot() actually is same thing plt.plot()

Adjusting the Plot: Line Colors and Styles

The first adjustment you might wish to make to a plot is to control the line colors and styles. The plt.plot() function takes additional arguments that can be used to spec‐ ify these. To adjust the color, you can use the color keyword, which accepts a string argument representing virtually any imaginable color.

If no color is specified, Matplotlib will automatically cycle through a set of default colors for multiple lines. Similarly, you can adjust the line style using the linestyle keyword

If you would like to be extremely terse, these linestyle and color codes can be com‐ bined into a single nonkeyword argument to the plt.plot() function.

Adjusting the Plot: Axes Limits

Matplotlib does a decent job of choosing default axes limits for your plot, but some‐ times it’s nice to have finer control. The most basic way to adjust axis limits is to use the plt.xlim() and plt.ylim() methods.

If for some reason you’d like either axis to be displayed in reverse, you can simply reverse the order of the arguments

Labeling Plots

As the last piece of this section, we’ll briefly look at the labeling of plots: titles, axis labels, and simple legends. Titles and axis labels are the simplest such labels — there are methods that can be used to quickly set them

When multiple lines are being shown within a single axes, it can be useful to create a plot legend that labels each line type. Again, Matplotlib has a built-in way of quickly creating such a legend. It is done via the (you guessed it) plt.legend() method.

Simple Scatter Plots

Another commonly used plot type is the simple scatter plot, a close cousin of the line plot. Instead of points being joined by line segments, here the points are represented individually with a dot, circle, or other shape.

Scatter Plots with plt.plot

In the previous section, we looked at plt.plot/ax.plot to produce line plots. It turns out that this same function can produce scatter plots as well

The third argument in the function call is a character that represents the type of sym‐ bol used for the plotting. Just as you can specify options such as ‘-’ and ‘ — ‘ to con‐ trol the line style, the marker style has its own set of short string codes. The full list of available symbols can be seen in the documentation of plt.plot, or in Matplotlib’s online documentation. Most of the possibilities are fairly intuitive, and we’ll show a number of the more common ones here.

Scatter Plots with plt.scatter

A second, more powerful method of creating scatter plots is the plt.scatter func‐ tion, which can be used very similarly to the plt.plot function.

The primary difference of plt.scatter from plt.plot is that it can be used to create scatter plots where the properties of each individual point (size, face color, edge color, etc.) can be individually controlled or mapped to data.

Let’s show this by creating a random scatter plot with points of many colors and sizes. In order to better see the overlapping results, we’ll also use the alpha keyword to adjust the transparency level.

For example, we might use the Iris data from Scikit-Learn, where each sample is one of three types of flowers that has had the size of its petals and sepals carefully measured.

Visualizing a Three-Dimensional Function

We’ll start by demonstrating a contour plot using a function z = f( x, y)

we write our function.

Notice that by default when a single color is used, negative values are represented by dashed lines, and positive values by solid lines. Alternatively, you can color-code the lines by specifying a colormap with the cmap argument. Here, we’ll also specify that we want more lines to be drawn 20 equally spaced intervals within the data range.

Here we chose the RdGy (short for Red-Gray) colormap, which is a good choice for centered data.

Our plot is looking nicer, but the spaces between the lines may be a bit distracting. We can change this by switching to a filled contour plot using the plt.contourf() function (notice the f at the end), which uses largely the same syntax as plt.con tour().

Histograms, Binnings, and Density:

A simple histogram can be a great first step in understanding a dataset.

The hist() function has many options to tune both the calculation and the display; here’s an example of a more customized histogram.

The plt.hist docstring has more information on other customization options avail‐ able. I find this combination of histtype=’stepfilled’ along with some transpar‐ ency alpha to be very useful when comparing histograms of several distributions.

Conclusions

In this article, I had the opportunity to examine many examples of data visualization. I tried to show how important and fun data visualization is thanks to the beautiful and simple examples given in the Data science Handbook. Thank you for reading. I hope it was useful for you.

Contact me

Linkedin: https://www.linkedin.com/in/boran-oktay-dabak-5377bb176/

Upwork Profile: https://www.upwork.com/freelancers/~01619bd76c75349fd3

Youtube: https://www.youtube.com/channel/UCsGwZ3006CuJWcA5J3UPVWw

Github: https://github.com/oktaydbk54

--

--