Day (7) — Data Visualization — How to use Plotly and Cufflinks for Interactive Data Visualizations

5 min readMar 7, 2018

This article covers work from the Python for Data Science and Machine Learning Bootcamp course on Udemy by Jose Portilla and helpful tips along the way. This course was very helpful in gaining a base understanding of the topic.

When learning new techniques and skills, there are many ways to attack the task. Some people are visual learners, in that they prefer the use of graphics and images to understand new information. Some of us absorb information best via reading and taking notes. Others may have a natural gravitation towards understanding new content through listening and speaking environments. And, some learn best through hands-on practice. Today, we are going to combine the visual and kinesthetic learning styles. We have been reviewing seaborn and built-in pandas data visualizations, but let’s take it up a notch.

Introducing the one, the only…Plotly and Cufflinks. Plotly is an open source tool for creating interactive data visualizations. However, cufflinks sounds like the title of a scene from “Fifty Shades of Grey”. Cufflinks connects Plotly with pandas to produce the interactive data visualizations. Well, let’s get into it.

Topics:

How to setup environment to use plotly and cufflinks
How to generate line plots
How to generate scatter plots -> used to display data points on horizontal and vertical axes
How to generate bar plots -> really powerful when calling aggregate functions on dataset
How to generate box plots -> used to display distribution, central value and variability of data
How to generate surface plots -> used for generating three dimensional plots
How to generate histogram plots -> used to display the distribution of a numerical dataset
How to generate spread plots -> often used for stock comparison
How to generate bubble plots -> typically used to display world GDP graphics…similar to scatter plots
How to generate scatter matrix -> similar to the seaborn pairplot

The Setup:
* The example uses Python 3.6 within Jupyter notebook with the below dependencies. Note: If error on notebook occurs use the following in the terminal to increase the limit.

jupyter notebook --NotebookApp.iopub_data_rate_limit=214748364

Matplotlib 2.1.2
Numpy 1.14.1
Pandas 0.20.3
Plotly 2.4.1
Cufflinks 0.12.1

Warning:
* Feel free to review the docs for additional arguments for the methods.

The Install
To start, we will need to first install plotly and cufflinks if not already done. We may do this via the terminal with the below commands.

pip install plotly
pip install cufflinks

Now comes the part of importing dependencies. We will be using download_plotlyjs, init_notebook_mode, plot and iplot from plotly.offline and the .go_offline() method to allow us interactive visualizations offline.

download_plotlyjs -> Allows for us to work with the visualizations offline
init_notebook_mode -> Allows for us to plot graphs offline inside a Jupyter Notebook Environment

Line plots
~
Use the .iplot() method to generate a line plot with the dataset. This plot allows us to click on the elements in the legend to hide and display context which is pretty neat. My the cursor to the top right of the plot to observe the various features of the plot. We can also use the zoom feature of specific areas of the plot.

Scatter plots
~
Use the .iplot() method with arguments kind (plot type), x (x-axis variable), y (y-axis variable), and mode argument removes the line connections setup by default with plotly. The plot can be zoomed in or out depending on need.

Bar plots
~
Use the .iplot() method with arguments kind (defines plot type), and an aggregate method to group data by. When we hover over the content we are able to view the actual data.

Box plots
~
Use the .iplot() method with argument kind (defines plot type). Turn on and off data by selecting the specific element.

Surface plots
~
Use the .iplot() method with arguments kind (defines plot type), and colorscale to alter the plot color.

Histogram plots
~
# Use the .iplot() method with arguments kind (define plot type), and bins to specify the quantity of distribution buckets.

Spread plots
~
Use the .iplot() method with argument kind (defines plot type) and conduct a conditional selection for the desired columns.

Bubble plots
~
Use the .iplot() method with arguments kind (define plot type), x (x-axis variable), y (y-axis variable) and size to reference the data point sizes.

Scatter Matrix plots
~
Use the .scatter_matrix() method with arguments. May want to use this with caution, when using large datasets, as it could crash your notebook kernel.

Overall, this was an exiting exercise into merging visual engaging plots with a hands-on feature to learn from the data. Until next time…

“When you arise in the morning, think of what a precious privilege it is to be alive — to breathe, to think, to enjoy, to love.” ~ Marcus Aurelius

Day (7) — Data Visualization — How to use Plotly and Cufflinks for Interactive Data Visualizations

Written by Keith Brooks