Time Series Forecasting using Deep Learning

Manohar Varma

Published in

AI Skunks

10 min readApr 11, 2023

A detailed intuition on time series data using neural model RNN(LSTM)

Abstract

In this Analysis we’re going to work on time series of Canadian weather data collected since 1960 daily. we’re going to explore Patterns, Distributions, Imputation, Model building, Evaluation

What is time series data
About dataset
Imputation Techniques in Time Series data
RNN
LSTM
Model Fitting
Evaluation

What is time series data?

Time series data refers to a type of data where observations are recorded at regular intervals over time. In other words, time series data is a collection of data points collected at successive points in time, such as hourly, daily, weekly, monthly, or yearly intervals. Examples of time series data include stock prices, temperature readings, sales data, and website traffic data.

Time series data can be used to identify patterns or trends over time, and can be used for forecasting future values or making predictions based on past observations. Time series analysis is a branch of statistics that focuses on modeling and analyzing time series data to understand underlying patterns, relationships, and trends. It is widely used in various fields, including finance, economics, engineering, and science.

How time series data is different?

Time series data is different from normal data in that it has a temporal ordering or sequence to the observations, where each observation is recorded at a specific time. This means that time series data has a unique structure and characteristics that must be considered when analyzing or modeling the data.

Normal data, on the other hand, is typically composed of independent observations that do not have a time ordering. Normal data can include various types of data such as categorical, numerical, or textual data, and does not necessarily have any inherent structure or temporal ordering.

The structure of time series data makes it useful for analyzing patterns, trends, and relationships over time, such as identifying seasonal patterns, trends, or cyclic behavior. Normal data, on the other hand, is often analyzed using standard statistical techniques such as hypothesis testing, regression analysis, or clustering, without considering any temporal ordering of the observations.

In summary, the main difference between time series data and normal data is that time series data has a temporal ordering or sequence to the observations, while normal data does not. This difference requires different approaches to analyze and model these types of data.

Now let’s get into our example dataset and explore more into nuances

About Dataset

This dataset has been compiled from public sources. The dataset consists of daily temperatures and precipitation from 13 Canadian centres. Precipitation is either rain or snow (likely snow in winter months). In 1940, there is daily data for seven out of the 13 centres, but by 1960 there is daily data from all 13 centres, with the occasional missing value. So, let’s consider the data from 1961 & impute for those missing values after 1961.

import pandas as pd
import numpy as np
df = pd.read_csv('https://raw.githubusercontent.com/Varmai/Data-sets/main/Canadian_climate_history.csv')

Let’s check the contents of data

Seems the data is taken 12 in the night everyday. So, the frequency of the data here is daily.

The data is inconsistent as it has missing values in between. In time series we can’t simply drop null values rows as we loose the sequence of time.

As the data is about weather mean, median, mode imputation doesn’t work pretty well. So, we need to consider some advanced techniques and check which better fits in.

There are many types of imputation methods when it comes to time series but I took mainly three important methods after thorough research and reading through several articles.

1. Time Interpolation

Time interpolation is specifically used for time series data. It involves estimating the values of missing data points in a time series by using the values of adjacent data points and the time intervals between them.

In time interpolation, the missing values are estimated by assuming that the underlying process that generated the time series is continuous over time. The method of time interpolation used depends on the characteristics of the time series data and the intended use of the imputed data.

y(t) = y(t1) + [(y(t2) — y(t1)) / (t2 — t1)] * (t — t1)

y(t) is the estimated value of the missing data point at time t
y(t1) and y(t2) are the known values of the adjacent data points at times t1 and t2 respectively
t1 and t2 are the times corresponding to the known data points
t is the time at which the missing data point needs to be estimated.

time_impute = df.interpolate(option='time')

2. Imputation using deep learning

Deep learning imputation is a method of imputing missing data using deep neural networks. It involves training a neural network to learn the relationship between the available data points and the missing data points in the dataset.

Deep learning imputation has shown promising results in imputing missing data in various domains, including medical imaging, genomics, and financial data. However, it is important to note that deep learning imputation methods can be computationally expensive and require large amounts of training data. Additionally, the quality of the imputed data can be affected by factors such as the size of the missing data, the complexity of the underlying patterns in the data, and the architecture and hyperparameters of the neural network used.

import tensorflow as tf
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model
from sklearn.model_selection import train_test_split

# split the dataframe into training and test sets
train_df, test_df = train_test_split(df, test_size=0.3)

# create the neural network model
inputs = Input(shape=(train_df.shape[1],))
x = Dense(64, activation='relu')(inputs)
x = Dense(64, activation='relu')(x)
x = Dense(64, activation='relu')(x)
x = Dense(64, activation='relu')(x)
x = Dense(64, activation='relu')(x)
x = Dense(train_df.shape[1], activation=None)(x)
model = Model(inputs=inputs, outputs=x)
model.compile(optimizer='adam', loss='mse')

# train the model on the training set
model.fit(train_df.fillna(0), train_df.fillna(0), epochs=100)

# evaluate the model on the test set
test_loss = model.evaluate(test_df.fillna(0), test_df.fillna(0))
print('Test loss:', test_loss)

# use the model to predict the missing values
df_imputed = model.predict(df.fillna(0))
deep_learning_impute = pd.DataFrame(df_imputed, columns=df.columns)
deep_learning_impute[df.isnull()] = np.nan
print(deep_learning_impute)

3. Linear Interpolation

Linear interpolation is a method of imputing missing data points by estimating their values based on the values of adjacent data points. It involves fitting a straight line between two adjacent data points and using this line to estimate the value of the missing data point.

Linear interpolation assumes that the relationship between the known data points is linear, meaning that the data points follow a straight line. It is a simple and commonly used method for imputing missing values in various fields, such as engineering, finance, and economics.

To estimate the missing value, we can use the formula:

y3 = y1 + ((y2 — y1) / (x2 — x1)) * (x3 — x1)

This formula estimates the value of a missing data point (y3) between two known data points (y1 and y2) at time intervals x1 and x2 respectively.

linear_impute = df.interpolate(option='linear')

Now the imputations are done, so let’s dive into the model

What is a neural network?

A neural network is a type of machine learning algorithm that is designed to recognize patterns and relationships in data. It is inspired by the structure and function of the human brain, and consists of a large number of interconnected processing nodes (also known as neurons) that work together to learn and process information.

Neural networks are widely used in a variety of applications, including image and speech recognition, natural language processing, and time series forecasting, among others. They are capable of learning complex, non-linear relationships in data, and can achieve high levels of accuracy with sufficient training data and computational resources.

What is an RNN (LSTM) network?

Recurrent Neural Networks (RNNs) are a type of neural network that are designed to handle sequential data, such as time series, speech signals, and natural language. They work by processing each input in a sequence and using the output from the previous step as input for the current step.

Long Short-Term Memory (LSTM) is a specific type of RNN that is designed to address the problem of the vanishing gradient, which can occur when training deep neural networks. The vanishing gradient problem refers to the fact that gradients can become very small as they propagate backwards through many layers, making it difficult to learn long-term dependencies in the data.

LSTMs solve this problem by introducing a memory cell, which allows the network to selectively remember or forget information over time. Each LSTM cell contains three gates (input, forget, and output) that control the flow of information into and out of the cell. The input gate determines which information to remember, the forget gate determines which information to forget, and the output gate determines which information to output.

During training, the LSTM network learns to adjust the weights and biases of the gates and the memory cell in order to minimize the difference between its predicted outputs and the actual outputs in the training data. They are particularly effective for modeling long-term dependencies in sequential data, and can achieve high levels of accuracy with sufficient training data and computational resources.

from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense, LSTM

# Normalize the data
scaler = MinMaxScaler(feature_range=(-1, 1))
scaled_data = scaler.fit_transform(deep_learning_impute)

# Split the data into train and test sets
train_size = int(len(scaled_data) * 0.7)
train_data = scaled_data[:train_size]
test_data = scaled_data[train_size:]

# Define the window size
window_size = 2

# Create the input and output data for the model
def create_dataset(data, window_size):
    X, Y = [], []
    for i in range(len(data)-window_size-1):
        X.append(data[i:(i+window_size)])
        Y.append(data[i+window_size])
    return np.array(X), np.array(Y)

train_X, train_Y = create_dataset(train_data, window_size)
test_X, test_Y = create_dataset(test_data, window_size)

# Reshape the input data for LSTM
train_X = np.reshape(train_X, (train_X.shape[0], train_X.shape[1], 26))
test_X = np.reshape(test_X, (test_X.shape[0], test_X.shape[1], 26))

# Define the LSTM model
model = Sequential()
model.add(LSTM(50, return_sequences=True, input_shape=(window_size, 26)))
model.add(LSTM(45, return_sequences=True))
model.add(LSTM(40, return_sequences=True))
model.add(LSTM(35, return_sequences=True))
model.add(LSTM(30, return_sequences=True))
model.add(LSTM(30))
model.add(Dense(26))
model.compile(loss='mean_squared_error', optimizer='adam')

# Train the model
model.fit(train_X, train_Y, epochs=30, batch_size=64)

# Make predictions on the test set
predictions = model.predict(test_X)

# Calculate the root mean squared error of the predictions
rmse = np.sqrt(np.mean((predictions - test_Y)**2))
print("RMSE:", rmse)

The above model takes deep learning imputed data as input with 5 hidden layers, input and output layer.

The window size is defined as 2, which means the neural network will use the previous two time steps to predict the next step.

A function is defined to create input-output pairs for training the LSTM model. The input data consists of a sequence of two time steps, and the output data is the next time step value. The function returns two NumPy arrays containing input and output data.

The input data is then reshaped into the 3D tensor required by LSTM. The LSTM model is defined using Sequential, and multiple LSTM layers with different numbers of neurons are stacked on top of each other. A Dense layer is added at the end to produce the output. The model is compiled with mean squared error as the loss function and Adam as the optimizer.

The model is trained on the training data using fit method of the model with batch size of 64 and epochs of 30. After training, the model is used to make predictions on the test data. The root mean squared error (RMSE) is calculated as a measure of the accuracy of the predictions.

The model gave an RMSE of 0.13 which is great

Then I calculated the model performance with other metrics like R-Squared error which is pretty much like accuracy score.

ssr = np.sum((predictions - test_Y)**2)
sst = np.sum((test_Y - np.mean(test_Y, axis=0))**2)
r2 = 1 - (ssr / sst)
print("R-squared:", r2)

# Comparing with the baseline model
baseline_rmse = np.sqrt(np.mean((test_Y[1:] - test_Y[:-1])**2))
baseline_r2 = 1 - ((baseline_rmse / np.std(test_Y[:-1]))**2)
print("Baseline R-squared:", baseline_r2)

But if I change the input data generated by interpolation methods the R-squared value decrease by 5%(69%) in linear & 2%(72%) in time interpolation. So, the time interpolation is equally efficient as dense network imputation. But the computation & time is exponentially efficient in ‘Time Interpolation Imputation Technique’.

Conclusion:

Overall the model performed pretty great also to predict the temperature accurately we need to know different parameters like wind, clouds, humidity etc. Provided all the parameters we can achieve even more great results.

Ufff! That’s a lot to learn, I know. If this article make some sense please give a clap. Don’t hesitate to let me know in respond section if I’m missing out on something.

You can check in detail code at my GitHub