Interactive Exploratory Data Analysis (EDA) of Sensor Data With Pandas: The Plotting API and Plotting Backends
The pandas plotting API
Pandas allows to plot DataFrames and Series with a almost identical API and plot types. DataFrames have
pandas.DataFrame.boxplot() in addition to
pandas.DataFrame.plot.box() . But that’s the only exception without relevance cause as far as I know the later has the same functionality as the first one. In the sensor data context w.r.t. DataFrames one plots usually a single (univariate data) or several columns (sensor fusion of several univariate data, multivariate data). W.r.t. Series one usually plots time series (timestamped series) with usually datetime timestamps as index and univariate data as values. Dependent on what data is contained in the DataFrames/Series and how data is represented using the DataFrames/Series some plot types do not really make sense. We’ll dive deeper into this in a later post. For a deep dive you’ll probably want to have a look into plotting API source code.
The pandas DataFrame plotting API
Data from Pandas DataFrames may be plotted with
- pandas.DataFrame.plot: Make plots of DataFrame. The default type is a line plot.
- pandas.DataFrame.plot.area: Draw a stacked area plot.
- pandas.DataFrame.plot.bar: Vertical bar plot.
- pandas.DataFrame.plot.barh: Make a horizontal bar plot.
- pandas.DataFrame.plot.box: Make a box plot of the DataFrame columns.
- pandas.DataFrame.plot.density: Generate Kernel Density Estimate plot using Gaussian kernels.
- pandas.DataFrame.plot.hexbin: Generate a hexagonal binning plot.
- pandas.DataFrame.plot.hist: Draw one histogram of the DataFrame’s columns.
- pandas.DataFrame.plot.kde: Generate Kernel Density Estimate plot using Gaussian kernels.
- pandas.DataFrame.plot.line: Plot DataFrame as lines.
- pandas.DataFrame.plot.pie: Generate a pie plot.
- pandas.DataFrame.plot.scatter: Create a scatter plot with varying marker point size and color.
- pandas.DataFrame.boxplot: Make a box plot from DataFrame columns.
The pandas Series plotting API
Data from Pandas Series may be plotted similarly to DataFrames with
- pandas.Series.plot: Make plots of Series. The default type is a line plot.
- pandas.Series.plot.area: Draw a stacked area plot.
- pandas.Series.plot.bar: Vertical bar plot.
- pandas.Series.plot.barh: Make a horizontal bar plot.
- pandas.Series.plot.box: Make a box plot of the Series.
- pandas.Series.plot.density: Generate Kernel Density Estimate plot using Gaussian kernels.
- pandas.Series.plot.hist: Draw one histogram of the Series.
- pandas.Series.plot.kde: Generate Kernel Density Estimate plot using Gaussian kernels.
- pandas.Series.plot.line: Plot Series as line.
- pandas.Series.plot.pie: Generate a pie plot.
Before Pandas version 0.25 the builtin plotting functionality for DataFrames and Series used matplotlib as backend with support for static, non-interactive plots. Beginning with Pandas version 0.25 it’s to use other, potential interactive plotting frameworks for plotting. To change the plotting backend put either
import pandas as pdpd.options.plotting.backend = '<BACKEND-NAME>'
import pandas as pdpd.set_option('plotting.backend', '<BACKEND-NAME>')
into a Notebook cell and execute the cell.
Some of the visualization backends supported are listed on the visualization docs page. Other’s can be found on stackoverflow (Change pandas plotting backend to get interactive plots instead of matplotlib static plots). One thing to point out is that the plot types supported by the backends do not neccessarily all work with pandas DataFrames or Series. Which plot types are supported and to what degree depends on of to what extend the backends implement the Pandas plotting API.
altair_pandasvia Altair (backend name:
altair): Supported interactive plot types.
hvplotvia Bokeh (backend name:
holoviews, beginning with version 0.5.1): Supported interactive plot types.
pandas-bokehvia Bokeh (backend name:
pandas_bokeh): Supported interactive plot types.
plotly, beginning with version 4.8): Supported interactive plot types.
We’ll be using the following versions of the visualization integration packages (taken from
# plotting backends
plotly==4.14.1 # requires additional setup
When using Altair one has to install the Pandas Backend
altair_pandas as dependency directly from GitHub and cannot be pinned to a specific version tag ATM and is installed as part of
pip install -r requirements.txt or via
pip install git+https://github.com/altair-viz/altair_pandas which can create trouble w.r.t. plot styling reproducibility (in case the implementation is changed over time).
When using altair and/or pandas-bokeh and/or hvplot interactive plots work out of the box. With plotly you’ve to consider additional setup steps described here when using plain Jupyter Notebooks and here when using JupyterLab. Cause I host the examples on binder and this additional setup step I’ve not included plotly plots at the time of writing.
Plot type compatibility
hvplot depends on
scipy to being able to use
pandas.DataFrame.plot.density(). In case you miss the dependency
scipy you’d get an import error:
ImportError: univariate_kde operation requires SciPy to be installed.
The following table summarizes which backend supports which plot types.
pandas.DataFrame.plot.kde() methods are supported by
hvplot only. The
pandas.DataFrame.plot.pie() method is supported by
pandas_bokeh only. In the context of sensor date pie plots are rather unimportant. Density and KDE plot however can be important. I’d recommend to use
hvplot in the beginning. The interactive features like information shown when using mouse hover, etc. may differ significantly. In case you miss information or do not like the look and feel of
hvplot you can check it with one of the other frameworks. In any case the probability is high that you’ll have to switch the backend in your notebooks dependent on what plot type you want to use.
Feel free to visit source code repository, press the “Binder” button to open the repository in a Binder environment and explore the
DataFrameplot interactivity in the notebook backend_pandas_plotting_api_compatibility_dataframe.ipynb .