Interactive Plots with Plotly and Cufflinks on Pandas Dataframes

A simple and easy introduction to interactive visualisation with Plotly in python.

Ozan Bulum
4 min readOct 8, 2018

Pandas is one of the the most preferred and widely used tools in Python for data analysis. It also has it’s own sample build-in plot function. Hovewer when it comes to interactive visualization, Python users face some difficulties if they haven’t front-end engineer skills since lots of library such as D3, chart.js requires some javascript background. This is where that Plotly and Cufflinks come handy.

Plotly is built on top of d3.js and is specifically a charting library which can be used directly with Pandas dataframes, thanks to another library named Cufflinks .

In this short introduction we will show how to use Plotly interactive plots directly with Pandas dataframes. We will use Jupyter Notebook (installed using Anaconda Distribution with Python 3.6.4) and famous Titanic dataset to keep it simple.

Plotly Version

When we publish this article Plotly’s latest release was 3.3.0 and cufflinks’ was 0.14.5. It is important to update both package simultaneously or find compatible versions since older cufflink versions do not support new released plotly versions. You can install plotly with following commands on Anaconda Prompt (or on Terminal if you use OS or Ubuntu)

conda install -c plotly plotly
conda install -c conda-forge cufflinks-py

Or you can update if you have already installed

pip install plotly --upgradepip install cufflinks --upgrade

Loading Libraries

First we will load Pandas, Plotly and Cufflinks libraries. Since plotly is an online platform, login credential must be introduced in order to use it in online mode. In this article we will use offline mode which is quite enough for Jupyter Notebook usage.

#importing Pandas 
import pandas as pd
#importing plotly and cufflinks in offline mode
import cufflinks as cf
import plotly.offline
cf.go_offline()
cf.set_config_file(offline=False, world_readable=True)

Loading Dataset

We mentioned that we will use the Titanic dataset, you can download it directly from kaggle link . We will only use train.csv file.

df=pd.read_csv("train.csv")
df.head()

Histogram

Histograms can be used to check a feature distributions, “Age” featurein this example. We simple select column using (dataframe[“columname”]) syntax and add .iplot function afterward. We can specify bin size , theme, title and axis names like following example. You can check all parameters for iplot parameter with “help(df.iplot)” command.

df["Age"].iplot(kind="histogram", bins=20, theme="white", title="Passenger's Ages",xTitle='Ages', yTitle='Count')
Histogram with plotly

If you want to compare 2 different distribution you can plot them as two different columns. For example we will show female and male passengers’ ages in the same plot.

df["male_age"]=df[df["Sex"]=="male"]["Age"]
df["female_age"]=df[df["Sex"]=="female"]["Age"]
df[["male_age","female_age"]].iplot(kind="histogram", bins=20, theme="white", title="Passenger's Ages",
xTitle='Ages', yTitle='Count')
Histogram of different features by plotly

Heatmap

Heatmap has lots of usage but as an example we will check correlation between features in dataset.

df.corr().iplot(kind='heatmap',colorscale="Blues"title="Feature Correlation Matrix")
Heatmap by plotly

Boxplot

Boxplots are quite handy for quickly interpreting skewness, outliers or quartile ranges in data. We will now use boxplot to show “Fare” distribution for every class in Titanic.

#we will get help from pivot tables to get Fare values in different columns for each class.
df[['Pclass', 'Fare']].pivot(columns='Pclass', values='Fare').iplot(kind='box')
Boxplot by plotly

Scatter Plots

Scatter plots are mostly used to see relationship between 2 quantitative variable. We will use scatter plots for “Fare” and “Age” variables. “categories” helps us to show selected feature’s variables in different colors (sex of passengers in this case).

df.iplot(kind="scatter", theme="white",x="Age",y="Fare",
categories="Sex")
Scatter plot by plotly

a quick note: “categories” parameter must be given a string or float64 type column. For example you have to convert integer type “Survived” column to float64 or string as below in Bubble Chart example.

Bubble Chart

Bubble charts help us to see multiple variable relationship in same times. In plotly we can adjust easily color and size sub categories with “categories” and “size” parameters. Also we can specify on hover text column with “text” parameter.

#converting Survived column to float64 to be able to use in plotly
df[['Survived']] = df[['Survived']].astype('float64', copy=False)
df.iplot(kind='bubble', x="Fare",y="Age",categories="Survived", size='Pclass', text='Name', xTitle='Fare', yTitle='Age')
Bubble chart by plotly

Bar Graph

Bar graphs are good to present the data of different groups that are being compared with each other. Plus they can be used stacked to show different variable effects. We will make bar graph to show survived passenger count by sex.

survived_sex = df[df['Survived']==1]['Sex'].value_counts()
dead_sex = df[df['Survived']==0]['Sex'].value_counts()
df1 = pd.DataFrame([survived_sex,dead_sex])
df1.index = ['Survived','Dead']
df1.iplot(kind='bar',barmode='stack', title='Survival by the Sex')

“That’s all Folks!”, at least for now. I have tried to explain as simple as possible. I hope it helps beginners to adapt plotly easily.

--

--

Ozan Bulum

Ozan Bulum — Data Analytics, Digital Transformation