Image for post
Image for post
Photo by Luke Chesser on Unsplash

Time Series Data Analysis using Datalab, Pandas & Prophet

Vinu Kumar
Mar 18, 2019 · 5 min read

A time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. Examples of time series are heights of ocean tides, counts of sunspots, and the daily closing value of the Dow Jones Industrial Average.

- Wikipedia

There are a number of tools available to analyse time-series data, plot and generate insights. This post outlines my experience with one such data analysis tool called Pandas. Pandas is a software library for Python programming language, which offers data structures and operations for analysing time series.

Setup

The first tool of choice to use Pandas framework was Jupyter (acronym of Julia, Python and R, the three core languages supported) notebook. Jupyter can be run using Anaconda framework or using pip. Google Cloud provides a hosted version of Jupyter called Datalab. Datalab is what I have used for my prototype. Simply start a cloud shell and run the command:

A new VM is created, launched and a port forwarding is also created

Reading, parsing and merging CSV files using Pandas

The source files are in CSV file format, with one for each month. The files were uploaded into Google Cloud Storage for easy analysis. Use the below library to access GCS. The below commands retrieves the CSV file ‘data-export-site-2018–09-Sep18–5m.csv’ from the bucket ‘demo-bucket-horizonx’ and returns the path.

When running Python in Jupyter, IPython is used. IPython is a rich toolkit which allows running python interactively. IPython provides ‘magic commands’, very similar to command line tools which can be run within the shell. Datalab provides magic commands to easily access resources within Google Cloud like BigQuery, Google Cloud Storage, BigTable etc.

Pandas DataFrame

To access GCS, Datalab provides a magic command called “gcs”. It reads the CSV from the GCS URI into a variable data. This is, in turn, converted to a Pandas DataFrame object using the function read_csv. In the below snippet, df_sept is a Pandas DataFrame. The procedure is repeated for all the months available.

If you have a compressed CSV file, Pandas can read that as well into a DataFrame

DataFrame info function will spit out interesting information about the frame.

Once we have individual frames, the next step is to merge all of them together.

The variable frames is an array of dataframes, and Pandas function concat merges all of them into a single dataframe. Next line sets the column Date_Time as an index. The head function returns top 5 records by default.

If you are running Jupyter and want to access files from Google Cloud Storage, the library gcsfs can be used: https://github.com/dask/gcsfs

The DataFrame object provides an easy way to calculate the mathematical statistics functions.

dropna function drops empty fields and describe function calculates the standard mathematical functions.

Plotting

Plotting is an important capability in Jupyter notebook. There are a number of frameworks like matplotlib, Seaborn, mpld3, bokeh, Altair and others. matplotlib is the de-facto standard. Seaborn is based on matplotlib and makes the matplotlib plots richer. Below is a plot using seaborn, which shows a summary of three columns averaged to week.

The resulting plot looks like this:

Another example using matplotlib, showing two plots overlayed and gives an indication of anomalies.

The resulting plot:

Facebook Prophet for prediction

Prophet is a forecasting tool for Python and R. It always takes a DataFrame with two columns ‘ds’ (timestamp) and ‘y’ and provides two methods fit and predict.

The below code snippet demonstrates how to resample the Pandas DataFrame to be used with Prophet

Image for post
Image for post
Figure 11: Resampled DataFrame

Resample DataFrame for input into Prophet

Next step is to create a Prophet object and fit the DataFrame using the object

Next step will create a dataframe with future dates (6 months)

Run a prediction using the framework

Gives the below output:

Now using this, we can simply plot the prediction or plot the seasonality component.

This gives the following plot:

Conclusion

Pandas is a rich framework which fills the gap Python has in data analysis. Easy to use without much programming, it allows easy filtering, slicing and plotting of data as series or data frames.

Jupyter is a great interactive tool to explore, transform, visualise and share the analysis. It has a very rich ecosystem of modules to explore data across various sources and optimise machine learning models for deployments.

HorizonX

We’re a team of passionate, expert and customer-obsessed…

Vinu Kumar

Written by

Chief Technologist at HorizonX, Google Cloud Certified Data Engineer, Google Cloud Certified Architect, Consultant

HorizonX

HorizonX

We’re a team of passionate, expert and customer-obsessed practitioners, focusing on innovation and invention on our customer’s behalf.

Vinu Kumar

Written by

Chief Technologist at HorizonX, Google Cloud Certified Data Engineer, Google Cloud Certified Architect, Consultant

HorizonX

HorizonX

We’re a team of passionate, expert and customer-obsessed practitioners, focusing on innovation and invention on our customer’s behalf.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store