Tensorflow tf.data Pipeline for Multiple Time Series
My goal for this year is to get myself Tensorflow developer certified. As part of the preparations, I went through the various tutorials in the tensorflow website. One of the tutorials was for Time Series Forecasting. If you haven’t gone through it yet, I highly recommend going through the tutorial. Time series forecasting | TensorFlow Core.
Fast forward a few months, I came across a similar problem at work. I had to work with 10k+ different time series at once and develop models for such a dataset. I had written a class for Preprocessing from scratch which resulted in almost (500 lines of code for just the preprocessing). This preprocessing class could do all the tasks required for this particular task.
As the person I am, I wanted to further optimize this code and also learn to use tf.data pipeline for preprocessing. I remembered how I learnt about the Window Generator. Now comes the issue, Window Generator was for a single time series. I searched online for resources if there was an existing post of extending the WindowGenerator class for multiple time series but with no luck.
Here, I am sharing the WindowGenerator which works with Multiple Time Series
WindowsGenerator — Brief Introduction
WindowsGenerator is the preprocessing class from the Tensorflow’s Time Series Tutorial. It implements the following methods:
constructor
— takes a single time series dataframe (train, val and test dfs) and stores the relevant slicing information for inputs and labels.split_window
— takes a single array of sizetotal_window_size
and splits it to inputs and labels.plot
— plots amin(max_plots, batch_size)
number of examples. Shows all inputs, labels and predictions in the example plot.make_dataset
— converts the input array to atf.data.Dataset
and applies thesplit_window
method on the dataset. Returns us the input to the different models.
Details of the Dataset used for the tutorial
I have used the same weather dataset used in the original tutorial. I am trying to use ‘T (degC)’ as the label column and ‘p (mbar)’ and ‘rh (%)’ as the additional regressors to use as additional variables in the multi variate analysis.
Example of the data used for the updated Window Generator. The data has 3 series with the Series ID = 1, 2 or 3. Each Series is a copy of the original weather dataset and was only created for this tutorial.
Updated Window Generator for Multiple Time Series
The differences between Original WindowGenerator
and MultiSeriesWindowGenerator
constructor
- addition of
batch_size
as a parameter. - removal of
train_df, val_df, test_df
as parameters to the init function. - added
regressor_columns, static_columns
for better management of input features to the model. - addition of
GROUPBY
as a parameter to identify different series in the input data.
Now, we add a method to update the train, test and val dfs to the window generator class. The following code does 3 things
- Takes the input series (train, val, test) and convert them individually into tensors, in the shape
(n_series x n_batch x n_timesteps x n_features)
- I have added the normalization step inside
update_datasets
method. This is completely optional through the flagnorm
- I have moved the
column_indices
initialization from the constructor to theupdate_datasets
to update based on the train_df every time.
The split_window
and plot
methods do not need to be updated and can be used from the original WindowGenerator
class.
To keep the interface as same as possible, I have renamed WindowGenerator.make_dataset
to MultSeriesWindowGenerator.make_cohort
and add a new make_dataset
method which will call the make_cohort
method internally.
The above make_dataset
method does 3 things
- It will call
make_cohort
for each of the series in data. Note thatmake_cohort
returns atf.data.Dataset
as shown in the original tutorial. - It will then zip all different
tf.data.Dataset
and then we stack all theinputs
andlabels
as shown in thestack_windows
function to create a single Dataset object. - We unbatch the data, shuffle it and then batch the data again as per our defined batch_size.
- Finally we prefetch the Dataset and return this as the output.
The rest of the code remains same. At this point, I plotted an example to see if everything is working as expected.
I’ve tested using the MultiSeriesWindowGenerator
with the Baseline model provided in the tutorial. You can see the results here.
Final Thoughts
Thank you for following the tutorial so far. I hope I helped at least one person with their quest on finding a WindowGenerator which works for multiple time series. You can find the full code in the following git link: WindowGenerator_with_Multiple_Time_Series.ipynb
Are there any issues in this code? Are there any other features you would want to see implemented? Do let me know in the comments. Thanks in advance.
Happy Reading! and Have a great day everyone :)