Interactive Exploratory Data Analysis (EDA) of Sensor Data With Pandas: Univariate Time Series Data
Visualizing univariate time series data with the pandas plotting API
This post shows the basic look and feel of the pandas plotting API applied to typical univariate sensor data represented as time series. Feel free to visit source code repository, press the “Binder” button to open the repository in a Binder environment and explore the plot type interactivity in the notebook time_series_univariate.ipynb. Of course not every plotting type makes sense to visualize univariate time series data. However for the sake of completion and to make it clear that some plot types make no sense I’ve added GIFs for all of them. If a plotting backend does not support a plot type I skipped the GIF in the corresponding section. In the Binder environment I’ve tried to plot with all plot types to force output of the exceptions. These exceptions relate not to wrong usage of the pandas plotting API but could help to figure out that a plot type is simply not supported (yet). The example uses linear fake data of a temperature sensor. The fake data is constructed as follows:
d = [i for i in range(20, 20+10, 1)]
dti = pd.date_range("2020-01-01 12:00:00.000001", periods=10, freq="S").tz_localize("Europe/Berlin")
temperature_series = pd.Series(data=d, index=dti, name="Temperature")
This data is rather boring. I recommend to adjust the corresponding Jupyter Notebook cell and replace d = [i for i in range(20, 20+10, 1)]
with
import randomrandom.seed(42)
d = random.sample(range(20, 20+10), 10)
to generate random, almost ever non-linear fake data. The following sections show how the plot types look like and behave for the different plotting backends in the default configuration.
Pandas Series simplify working with time series data
Besides the power of the pandas builtin capabilities for visualizing time series data another huge advantage for visualization lies in the data structure itself. It’s comparably easy to get raw data into the pandas Series representation as well as to preprocess, combine, separate data using Series. This topic is huge and beyond the scope of this post. However it’s important to note cause it’s one of the reasons why interactive exploratory data analysis is that powerful when using pandas Series.
Area plot

In the default configuration the altair
backend creates non-optimal axis labels as well as formatting (y-axis: datetime formatting).

In the default configuration the pandas_bokeh
backend has the best axis labeling and datetime formatting.

In the default configuration the hvplot
backend creates non-optimal axis labeling and relative time axis information.
Bar plot
Bar plots are not suitable to visualize time series data. The comments for this plot type have been skipped.



Horizontal bar plot
Horizontal bar plots are not suitable to visualize time series data. The comments for this plot type have been skipped.



Box plot

The altair
backend is the only backend which shows most detailed statistical metrics.

The hvplot
backend does not show statistical metrics and not really usable.
Density plot

The hvplot
backend is the only backend capable of visualizing density plots and creates them with labeled axes out of the box.
Hist plot
Hist plots are not suitable to visualize time series data. The comments for this plot type have been skipped.



KDE plot

The hvplot
backend is the only backend capable of visualizing KDE plots and creates them with labeled axes out of the box.
Line plot

In the default configuration the altair
backend creates non-optimal axis labels as well as formatting (y-axis: datetime formatting). Line plots are usually used to visualize potentially long ranging data. This means being able to zoom into the data is essential. This backend does not support zooming at all.

In the default configuration the pandas_bokeh
backend has the best axis labeling and datetime formatting.

In the default configuration the hvplot
backend creates non-optimal axis labeling and relative time axis information.
Pie plot
Pie plots are not suitable to visualize time series data. The comments for this plot type have been skipped.

Scatter plot
Scatter plots require two related data sets. W.r.t. time series data visualization using a scatter plot does not make sense.
Conclusion
The look and interactive feel varies significantly dependent on the plotting backend in use.
The altair
backend is the least interactive one. It is not possible to zoom via the mouse wheel, it is not possible to select a plot area via drag and drop. In comparison to the other plotting backends the information shown during mouse hover is little. For plots with the datetime index on the x-axis the formatting differs in comparison to the other backends as well as between plot types all plotted with altair
. Exporting the visualizations to image formats is supported. The only plot type the altair
backend is recommended for is the box plot.
The pandas_bokeh
and hvplot/holoviews
backend both use bokeh
under the hood which results in a quite similar look. In My opinion the pandas_bokeh
backend uses slightly better defaults for datetime index formatting and labeling of axis. In addition it’s way easier to hit data points to display data via mouse hover. I’d recommend to use pandas_bokeh
for all other plot types relevant for time series data despite of the ones recommended to use with altair
or hvplot
(box, density, KDE plot). The most important plot types are hist plot and line plot.
The hvplot
backend is the only plotting backend which supports the density plot and KDE plot.