ARIMA Price Forecasting by Using Parseltongue…uhmm, I Mean Python

Himang Sharatun
SkyshiDigital
Published in
9 min readFeb 27, 2018

If you have watch or read Harry Potter, you must be familiar with Parseltongue, basically it’s a snake language and someone who could speak it called Parselmouth. As a programmer and a big fan of J.K Rowling’s work, I declare myself as a Parselmouth since I could “speak” using python and anaconda. LOL. Okay, enough with my cringe joke and let’s move on to something more important which is price forecasting. In this article I will give an example of price forecasting using ARIMA and explain how it works in the most humanly language possible.

Why price forecasting case is so special?

Most of tutorial or basic guide about machine learning, especially the one that talks about neural network(NN), usually explain ML as a black box which takes certain inputs and produce certain output and I agree with that but there is a problem if we want to use NN to forecast price. It is simple thing if you want to use NN to solve something like how far a car can go with this amount of fuel since the amount of fuel and how far car can go is directly and intuitively correlated. The problem is in price forecasting case there is no real input that directly correlated with price and the only one that might correlated with current price is previous price. Well, you could argue that political situation, market sentiment, supply and demand etc could affect the price of certain commodity but let’s be honest that it’s not really be the case for every commodity and no one really knows what could affect price for the next day. Even I’m not sure that previous price might affect current price, but at least it’s more visible for us to calculate current price based on previous prices rather than using abstract variable such as market sentiment which no one knows how to measure it. Even tough we already assume that current price is somehow correlated with previous price, we still don’t know how it is actually correlated and more importantly how to predict future price based on that correlation. Therefore we need a method called ARIMA to do so.

Let’s talk with the magic snake first

Before I explain how ARIMA actually works, let me show you that ARIMA does work to predict price by using python. In this example we will use price of cooking oil in Indonesia between 2010–2015. Here is the code to train ARIMA model and plot the prediction into graph:

After you download the data and run the python code you will see this following graph:

As you can see that our prediction graph is almost coincide with the real prices. In mathematical language our model has 29.45 RMSE and 21.25 MAE as depicted on the table here. I hope that this example convince you on how accurate ARIMA is for price forecasting.

What did you do and why it works?

ARIMA actually is combination of 3 method which are autoregression (AR), integrated(I) and moving average (MA), so to understand ARIMA we need to understand how these 3 method work first. But before that, let me explain about lag first since it’s important term that you need to understand first to learn about ARIMA.

If you look into our code you could see that in the configuration we give 3 arguments and as you might already guess those 3 arguments are each for the 3 method that ARIMA consist of. For now let’s temporarily ignore the arguments for integrated process since it’s gonna be easier if I explain it later but for AR and MA, the arguments represent the lag of our model. Like I’ve explained before,in this model we assume that future price is coralated with the previous one. But in reality, it doesn’t mean that we need to calculate future price by using all previous prices. We just need the most recent prices since it’s more relevant compared the older prices, not to mention using all previous prices to make prediction will be computationaly costly. In ARIMA how many previous prices that we want to use in the calculation is called lag. For example, if the lag is 3 it means that we will only use the last 3 day prices to predict the price for the next day.

Autoregression (AR)

After you understand what lag is, now let’s talk about autoregression. Not to be confused, autoregression is not about automatic regression like what I thought it was before I learn more about it, but autoregression means regression against itself. In linear regression, we regress the output y with certain input x or in formula we could write it into

y = m*x + b

with b and m is coefficient that we change each iteration to fit the training data. But, the problem in price forecasting there is no x as an input and we only have previous prices which we can’t say it’s x since it’s more mathematically correct to write it as y(t-1), y(t-2),…, y(t-n) because it’s the same y variable with the one we are trying to predict. Therefore, we need to modify the formula for linear regression so we can regress y against previous y. So, for example, in autoregression with 3 lag, we use

y = b0 + b1*y(t-1) + b2*y(t-2) + b3*y(t-3)

to predict future price with b0 is constant, b1 represent how much y(t-1) affect y, b2 represent how much y(t-2) affect y and so on and so for. In simple term autoregression is about finding how correlated previous prices with current price which in our formula represented by b and predict future price by using that correlation.

Moving Average (MA)

If in AR we predict based on assumption that previous prices is correlated with future price, in Moving Average (MA), we assume that future price is just average of previous prices with some adjustment. The intuition behind MA is that the price will not differ that much with mean of previous price and if there is difference we can calculate it by using the difference between mean and previous price. In formula, we write MA with 3 lag as

y = m + c0 + c1*e1 + c2*e2 + c3*e3

e1 = y(t-1)-m

e2 = y(t-2)-m

e3 = y(t-3)-m

with m is the mean of previous 3 prices, e1,e2,e3 is the difference between y(t-n) and the mean while c0,c1,c2,c3 is coefficient representing how much the error of y(t-n) influence y.

The main advantage of MA over AR is that MA understand that anomaly at certain day might affect future prices. So for example in our cooking oil data suddenly the demand of cooking oil drops due to bankruptcy of big restaurant with many branch. Obviously in this scenario the price of cooking oil will not only drop for tomorrow but it might drops for the next weeks even month. If we only use AR we will not be able to observe this pattern since AR doesn’t calculate the possibility of a certain event affect long term future price. If you want to use AR and MA separately, we need to observe our product and price fluctuation first. But by combining both AR and MA we could make a generalized model to predict future price without necessarily needs to observe whether our price is more suitable for AR or MA.

If we combine AR and MA the formula to predict future price become:

y = b1*y(t-1) + b2*y(t-2) + b3*y(t-3)+ c1*e1 + c2*e2 + c3*e3

Integrated

To predict future price, it would be more accurate if our data is stationary meaning that our data has constant mean and constant variance. But in reality, price is not always stationary which make prediction difficult. Therefore to make our data stationary we need to difference our data. The parameters in our ARIMA configuration represent the order of difference that we want to apply to our data.

Visually, differencing process on the graph look like this:

Credit to: archive.is

In the graph above you can observe that stationary data (blue line) moves up and down of certain horizontal line while non-stationary data (green line) even tough the graph move up and down, it doesn’t move between horizontal line.

Now, let’s see whether our data stationary or not by visualize it by using the code below:

And after you run the code you will see the following graph:

Cooking Oil Price Plot

As you can see, our data isn’t stationary yet because even though it moves ups and downs, we can’t draw horizontal line to represent the mean of our data. Therefore, we need to difference our data so the graph can look like this:

Cooking Oil Price Plot (Differenced)

Now we can draw horizontal line at 0 to represent the mean of our data. If you curious about how to plot the differenced price you can see the code below:

How to determine ARIMA configuration?

As you have learn previously, to implement ARIMA we need to determine the lag for AR, MA and how much we want to difference our data. To do that you could do it the ugly way by trial and error and observe which configuration produce the best result but the more elegant way to determine ARIMA configuration is by using autocorrelation function (ACF) for MA and partial autocorrelation function (PACF) for AR.

ACF is basically a method to measure how correlated the current price y with previous price y(t-n) while PACF is also do the same thing but by ignoring price in between y and y(t-n). In ACF while calculating correlation for y and y(t-3), for example, we also calculate indirect correlation between y, y(t-1) and y(t-2) but in PACF we ignore y(t-1) and y(t-2) and only calculate direct correlation between y and y(t-3). To plot ACF and PACF in python we will use the code below:

In code above, I limit the lags displayed because in normal mode plot_acf and plot_pacf function will plot acf and pacf for every possible lags. So for example, we have 1000 data, the function will plot 999 ACF and PACF which consume a lot of resource. Therefore I limit the lags into 100 since the bigger the lag the less correlation can be found.

This graph below illustrate the ACF for our data:

In the graph above, you could see that we can see that the highest correlation can be found at 1. To decide the best MA parameters we should choose the smallest possible number. Indeed, bigger number of parameters might increase the accuracy of prediction but it also increase computational burden so we need to choose based on the where the correlation starting to decay. In our graph, the correlation after 1 is starting to decay, that means that we can use 1 as MA parameters.

To decide the AR parameters we need to look at the PACF since AR ignore the relationship in between lag and current price. The PACF graph is illustrate by the graph below:

Based on PACF plot above, we could use 0 or 1 as AR parameter, but since 0 has higher correlation I prefer to use 0. By using 0 as AR parameters and 1 as MA parameter means that our prediction model is MA and not using AR.

That’s the end of this article, I hope that you could learn something from this simple implementation of ARIMA for price forecasting. To be honest there are a lot of method to forecast price depending on your data availability. Even in some cases you could use NN if there is intuitive parameter that affect the price for example in electricity price case. I myself before finding out about ARIMA, by using the same data in this article, I implement NN to forecast price by modifying the previous 3 price as an input but the result is disappointing.

If you have further question you can contact me at himang@skyshi.io or preferably you can come to my office at PT Skyshi Digital Indonesia. See you on the next article.

--

--