Time-Series Forecasting: Predicting Microsoft (MSFT) Stock Prices Using ARIMA Model

Published in

Analytics Vidhya

3 min readFeb 19, 2021

After largely successfully LSTM Model, lets try to recreate that success with an ARIMA Model. First a little about Time series and then we’ll discuss the implementation of ARIMA on Microsoft stock price dataset of over 20 years. Let’s do it!

Time-series & forecasting models

Traditionally most machine learning (ML) models use as input features some observations (samples / examples) but there is no time dimension in the data.

Time-series forecasting models are the models that are capable to predict future values based on previously observed values. Time-series forecasting is widely used for non-stationary data. Non-stationary data are called the data whose statistical properties e.g. the mean and standard deviation are not constant over time but instead, these metrics vary over time.

These non-stationary input data (used as input to these models) are usually called time-series. Some examples of time-series include the temperature values over time, stock price over time, price of a house over time etc. So, the input is a signal (time-series) that is defined by observations taken sequentially in time.

The AutoRegressive Integrated Moving Average (ARIMA) model

A famous and widely used forecasting method for time-series prediction is the AutoRegressive Integrated Moving Average (ARIMA) model. ARIMA models are capable of capturing a suite of different standard temporal structures in time-series data.

Terminology

Let’s break down these terms:

AR: < Auto Regressive > means that the model uses the dependent relationship between an observation and some predefined number of lagged observations (also known as “time lag” or “lag”).
I:< Integrated > means that the model employs differencing of raw observations (e.g. it subtracts an observation from an observation at the previous time step) in order to make the time-series stationary.MA:
MA: < Moving Average > means that the model exploits the relationship between the residual error and the observations.

Model parameters

The standard ARIMA models expect as input parameters 3 arguments i.e. p,d,q.

p is the number of lag observations.
d is the degree of differencing.
q is the size/width of the moving average window.

Let’s start coding!

Modules needed: Keras, Tensorflow, Pandas, Scikit-Learn & Numpy

Loading all the required modules here.

Now let’s check what we have with the dataset here.

df = pd.read_csv("MSFT.csv")df.head(5)

The target value to be predicted is going to be the “Close” stock price value.

Let’s get the input variables ready!

MAKING OUR ARIMA MODEL

Next, let’s divide the data into a training (70 % ) and test (30%) set. For this tutorial we select the following ARIMA parameters: p=6, d=1 and q=0.

Summary of the code

A rolling forecasting procedure is required given the dependence on observations in prior time steps for differencing and the AR model. To this end, we re-create the ARIMA model after each new observation is received.
Finally, we manually keep track of all observations in a list called history that is seeded with the training data and to which new observations are appended at each iteration.

Mean Squared Error is 2.3522

Getting the test data ready and making predictions

Visualising the results

I’ll let you guys experiment with the values of ARIMA Model, let me know if you can get a better MSE value!

Disclaimer There have been attempts to predict stock prices using time series analysis algorithms, though they still cannot be used to place bets in the real market. This is just a tutorial article that does not intent in any way to “direct” people into buying stocks.

Give me a follow if you liked this for more tech blogs!

Sayonara!