Plotly and cufflinks — An interactive Python visualization tool for EDA and Presentations

Israel Aminu
Analytics Vidhya
Published in
7 min readNov 12, 2019

At PyCon Nigeria 2019, I spoke on how to build interactive and beautiful plots with a Python library called Cufflinks and how to host the plots with Plotly. In this article I will walk you through on how you can make very insightful plots with them.

JAMB applicant statistics for each state in Nigeria (2017–2018) plotted with Cufflinks

At first before we jump right into visualization with Cufflinks we need to understand the concept of EDA and why its so so important in the field of data science and data analytics.

What is EDA?

Exploratory Data Analysis (EDA) is the process of visualizing and analyzing data to extract insights from it. In other words, EDA is the process of summarizing important characteristics of data in order to gain better understanding of the data set. — Code Heroku

EDA is very useful in the field of Data Science because it helps us in the better understanding of data and and by using it we can derive out trends and relationship among features that ultimately results in generation and selection of useful features that directly impact the model performance or draw statistical inference. It helps necessitate data for insights without making mere assumptions and hypothesis about its contents. This is very significant to watch out for before diving into the field of machine learning or statistical modeling, it’s very important to make sure the data are really what they are claimed to be and that there are no obvious anomalies or ambiguity in the dataset. EDA should be a core part of every Data Scientist when making or drawing out statistical inference from data.

Now that we’ve fully understand the concept of EDA and why it’s important, lets dive into data visualization using a very interactive Python visualization tool: Plotly and Cufflinks. Am sure many Data Scientist who code in python are more conversant with the Matplotlib and Seaborn visualization library. I know some might have not even heard of cufflinks or plotly before so let me do a quick review on what Plotly and Cufflinks is all about.

Plotly is a technical computing company headquartered in Montreal, Quebec, that develops online data analytics and visualization tools. Plotly provides online graphing, analytics, and statistics tools for individuals and collaboration, as well as scientific graphing libraries for Python, R, MATLAB, Perl, Julia, Arduino, and REST. Plotly is built on top of d3.js and is specifically a charting library which can be used directly with Pandas Dataframes, thanks to another library named Cufflinks.

Cufflinks connect Plotly with pandas to create graphs and charts of Dataframes directly. Its a Python library which is used to design graphs, especially interactive graphs. It can plot various graphs and charts like histogram, bar plot, boxplot, spread plot and many more. It is mainly used in data analysis as well as financial analysis. Cufflinks is an interactive visualization library which you can use to blow your audience away.

So for our visualization we’ll be using a wrapper on Plotly called Cufflinks designed to work with Pandas Dataframes. So, our entire stack is cufflinks > plotly > plotly.js > d3.js which means we get the efficiency of coding in Python with the incredible interactive graphics capabilities.

Advantages of Cufflinks over Other plotting libraries

I know you might be used to other python plotting libraries. But Plotly and Cufflinks offer great benefits such as:

  1. Dynamic Plots
  2. Hosting Service
  3. Requires only a single line of code to make plots
  4. It works 100% offline

Getting Started

pip install plotly

pip install cufflinks

Plotly and Cufflinks for now only works on a Jupyter Notebook.

First, before we proceed we need to import the necessary libraries…

Pandas 0.20.3
Plotly 2.4.1
Cufflinks 0.12.1

Note: The cf.go_offline() function allows you to make your plots offline which means that your plots are not saved online on your plotly account. The benefits of making the plots offline is that you can make changes to your plots in your notebook before you choose to save them. Also, to save your plots online you need to create a plotly account. You can create one here

To switch back to online mode you can use the syntax below:

Data

Most of the data I collected was via web scraping and from the Nigeria Bureau of Statistics. The Data I made use of was:

  • Nigeria electricity consumption (1971–2014)
  • JAMB Applicant Data for 2017–2018
  • Dangote cement stock from March to July

Next we need to import the data to the notebook.

Creating Plots

Line Plot

DANGCEM Stock (Mar-July) 2019:

I made a line plot with the Dangote Cement stock data using the code block below.

The output:

From the result shown you can see the graph are responsive when you hover over them, you also have the capability of zooming in to the plots, clicking on the legends to get more insights about your data. Also, the plots are not static and it only require one line of code to make the plots, isn’t this awesome?!

JAMB Data Line plot:

Output:

Nigeria Electricity consumption line plot:

Output:

Note: You can choose to use any desired theme you want, it can’t always be all black. You can know the themes available on cufflinks using the code block below:

Barplots

Jamb Data Bar plot

Output:

You can also plot stack Bar plots by setting barmode to “stack”

Output:

Responsiveness can be seen at the beginning of the post

Boxplots

Box plot on electricity consumption

The output:

Box plot on DANGCEM Stock (Mar-July) 2019

The output:

There are some other cool plots could could try out such as the Scatter plot, Area plot,Bubble Chart etc. You can check for the documentation here

Hosting Plot

Yes, you can choose to host any plots and share to anyone around the world thanks to Plotly. I will walk you through on the steps to take to host them.

For the plot to host, I am going to host this particular one because it was chosen by the audience at Pycon Nigeria 2019 😃

Step 1:

After you have made your plot on your Jupyter notebook, Click on the “Export to plot.ly”

It will redirect you to a dashboard.

Step 2:

Click on save

A pop up will appear to tell you how you choose to save your plots

You could change the plot to public or private and Grid which represents your data to private or public, it depends on what your working on, we choose to make the plot public and grid public also. After that, click on Save

Step 3:

You’ll see a share button, then a page like this will pop-up.

There you see, your plots and your data have been successfully hosted, you can copy and paste that link “https://plot.ly/~aminuisrael/21” on your web browser.

When you paste the URL you’ll be able to see your hosted plots, your data, your codes, and you can set the plot to full screen for presentation purposes.

This project is available on GitHub here. Thanks for reading and happy analyzing.

--

--