Deep Learning Times Series Modeling

The Importance of Creating a Model with Stationary Data

Published in

Analytics Vidhya

6 min readAug 21, 2020

Through out my data science journey I have learned so many different modeling techniques, but I just had not found my niche yet, and I had been patiently waiting for the right machine learning model to come along and sweep me off my feet. I had been working in the business, finance, and fintech space for the past few years and my love for business forecasting and predictions to help maximize shareholder’s wealth had to stay center staged.

Then one day, I discovered the machine learning Time Series Model! I was hooked on day one. The fact that I can create a model to test/train on past performances to help forecast the future is very interesting to me. How one can incorporate ML, Deep Learning, Neural Networks, and Auto Regression modeling to solve business problems is absolutely genius!

So the biggest question is, Exactly what is Time Series Modeling? In simple terms, it is the use of a model to predict future values based on previously observed values. You can use Time Series in the following industries:

1. Retail Industry

2. Energy Industry

3. Government

4. Financial Organization

5. Agriculture

The Time Series Model ARIMA

One of the most popular models to implement is the ARIMA model, which stands for Autoregressive Integrated Moving Average. ARIMA models aim to describe the autocorrelations in the data and incorporates the following concepts:

AR = Autoregression

I = Integration

MA = Moving Average

Stationarity

Before you can conduct any type of Time Series modeling, you have to ensure that your data has stationarity. Which means your data does not have trend or seasonality. Your data is considered stationary if it’s statistical properties such as mean, variance, and covariance remain constant over time.

If you do not know if your data is stationary, you can conduct the two following tests:

Rolling statistics — Plot the moving average or moving variance and see if it varies with time.
The Dickey-Fuller Test — Is a statistical test for testing stationarity.

Below I built a function to create both the rolling mean and the Dickey-Fuller test at the same time.

from statsmodels.tsa.stattools import adfuller
def stationarity_testing(timeseries):
        
        movingAverage = timeseries.rolling(window=12, center=False).mean()
        movingSTD = timeseries.rolling(window=12, center=False).std()
        
        fig = plt.figure(figsize=(12,7))
        plt.plot(timeseries, color='blue', label='Original')
        plt.plot(movingAverage, color='red', label='Rolling Mean')
        plt.plot(movingSTD, color='black', label = 'Rolling Std')
        plt.legend(loc='best')
        plt.title('Rolling Mean & Standard Deviation')
        plt.show(block=False)
        
        
        print ('Results of Dickey-Fuller Test: \n')
        dftest = adfuller(timeseries['value'], autolag='AIC')dfoutput = pd.Series(dftest[0:4], index=['Test Statistic', 'p-value', '#Lags Used', 'Number of Observations Used'])
        for key,value in dftest[4].items():
            dfoutput['Critical Value (%s)'%key] = value
        print(dfoutput)
stationarity_testing(h_mean)

There are two ways to determine if your data is stationary. If your “Test Statistic” is less than your “Critical Values” or your p-value is greater than 0.05, that is a clear indicator that your data is not stationary. As you can see from the above test, this data has a lot of stationary and we are going to need to remove it before we start our forecast model.

How to Remove Stationarity

A way to get rid of trend is to conduct the following processes:

Taking the log transformation

2. Differencing(Simple Moving Average and Exponential Moving Average)

3. Subtracting the previous value with .shift()

After I conduct each test, I will rerun through the rolling means and Dickey-Fuller test function to see if it performs better. I will pick the one that preforms the best and use it in my final ARIMA model.

Log

Using a log transformation makes the time series more uniform over time. You can tell that the log transformation test looks a lot better from our original test, but the Dickey-Fuller test shows a lot of stationary still.

h_mean_logScale = np.log(h_mean)stationarity_testing(h_mean_logScale)

Differencing (Simple Moving Average)

With the simple moving average, or also called the rolling mean, you can visually check to see if the mean changes over time. The ‘Test Statistic’ results are now lower than the ‘Critical Values’ but the p-value is still a little high. I want to run a few more tests to see if I can lower the p-value a little more.

movingAverage = h_mean_logScale.rolling(window=12).mean()
datasetLogScale_movingaverage = h_mean_logScale - movingAverage
datasetLogScale_movingaverage.head(12)datasetLogScale_movingaverage.dropna(inplace=True)
datasetLogScale_movingaverage.head(12)
stationarity_testing(datasetLogScale_movingaverage)

Differencing Exponential Moving Average

The Differencing Exponential Moving Average, which is also called a weighted rolling mean, assigns weights to all the previous values with an exponential decay factor. The p-value came down a lot more this time with this test. I am going to run one more test and then decide from all of my results.

exponentialmovingaverage = h_mean_logScale.ewm(halflife=12, min_periods=0, adjust=True).mean()
plt.plot(h_mean_logScale)
plt.plot(exponentialmovingaverage, color='red')
plt.show()
datasetLogScale_ex_movingaverage = h_mean_logScale - exponentialmovingaverage
stationarity_testing(datasetLogScale_ex_movingaverage)

Subtracting Previous Value with .Shift()

When you use the .shift() method, you shift the nominal values of the process from one level to another level. A few values periodically tend to be shifted away from the process mean, resulting in outlier values. This is the most popular stationarity method used when conducing ARIMA Time Series modeling. My results turned out pretty good with this one. The p-value is not 0.05 but it’s the closest to it at 0.156593. The ‘Test Statistic’ is below the critical values as well which is an indicator that my data is now stationary.

datasetLogDiffShifting = h_mean_logScale - h_mean_logScale.shift()
plt.plot(datasetLogDiffShifting)
plt.show()
datasetLogDiffShifting.dropna(inplace=True)
stationarity_testing(datasetLogDiffShifting)

ARIMA Results

I decided to use the last stationarity method above using .shift(). Once I applied my data to the ARIMA model, I got very accurate results. My original values are almost directly on top of my forecasted values. RSS stands for the Residual Sum of Squares. It measures the discrepancy between the data and the prediction model, in which my model had an RSS score of 0.0001.

I wanted to dedicate this blog to explaining stationarity but there were some other preliminary models that need to be conduced before you start your ARIMA model such as: