Time Series Forecasting in R

Aashiq Reza
Data For Tomorrow
Published in
5 min readJun 24, 2020

--

Time series forecasting with ARIMA model in R with code

Photo by Murray Campbell on Unsplash

Follow all the steps carefully on the data you have to forecast the price of commodities in R.

Step 1: Load all the required libraries

library(tidyverse)
library(lubridate)
library(forecast)
library(tseries)

Step 2: Take input of the data and use glimpse() to have a look on it

pricedata <- read_csv("foodprice.csv")glipmse(pricedata)

Output:

Rows: 5,105
Columns: 17
$ date <chr> "15-12-2006", "15-01-2007", "15-02-2007", "15-03-2007", "...
$ cmname <chr> "Wheat flour - Retail", "Wheat flour - Retail", "Wheat fl...
$ unit <chr> "KG", "KG", "KG", "KG", "KG", "KG", "KG", "KG", "KG", "KG...
$ category <chr> "cereals and tubers", "cereals and tubers", "cereals and ...
$ price <dbl> 23.0, 25.5, 25.5, 26.0, 26.0, 26.5, 26.5, 27.5, 29.5, 30....
$ currency <chr> "BDT", "BDT", "BDT", "BDT", "BDT", "BDT", "BDT", "BDT", "...
$ country <chr> "Bangladesh", "Bangladesh", "Bangladesh", "Bangladesh", "...
$ station <chr> "Barisal", "Barisal", "Barisal", "Barisal", "Barisal", "B...
$ stationID <dbl> 575, 575, 575, 575, 575, 575, 575, 575, 575, 575, 575, 57...
$ mktname <chr> "Barisal Division", "Barisal Division", "Barisal Division...
$ mktid <dbl> 112, 112, 112, 112, 112, 112, 112, 112, 112, 112, 112, 11...
$ cmid <dbl> 58, 58, 58, 58, 58, 58, 58, 58, 58, 58, 58, 58, 58, 58, 5...
$ ptid <dbl> 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 1...
$ umid <dbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, ...
$ catid <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
$ sn <chr> "112_58_15_5", "112_58_15_5", "112_58_15_5", "112_58_15_5...
$ default <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...

Step 3: Take a look on the “date” variable. We need to convert this into date and time. And then use class() to check the data type of date.

pricedata <- pricedata %>%
mutate(Date = dmy(date)) %>%
select(-date)
class(pricedata$Date)

Output:

> class(pricedata$Date)
[1] "Date"

Step 4: For this code, We will forecast on price of rice(coarse) only. Filter on “cmname” variable to get the information about rice only and store it in rice_ts. ggplot() is used to plot the result.

rice_price <- filter(pricedata, cmname == "Rice (coarse) - Retail")ggplot(rice_price, aes(x = Date, y = price, group = station)) + geom_line(alpha = 0.5)

Output:

Price of rice in many different stations

Step 5: Because the price differ from station to station, we took the median price of each months and summarize the data, then converted the summarized data into time series using ts().

summary_rice_price <- rice_price %>%
group_by(Date)%>%
summarize(median_price = median(price))
max_Date = max(summary_rice_price$Date)
min_Date = min(summary_rice_price$Date)
rice_ts <- ts(summary_rice_price$median_price, end=c(year(max_Date), month(max_Date)),start=c(year(min_Date), month(min_Date)),frequency=12)

Step 6: Check the ACF and PACF plots to decide whether or not to go with ARIMA model.

acf <- acf(rice_ts)
pacf <- pacf(rice_ts)

Output: As both of the figures shows a geometric pattern, ARIMA is a suitable model to predict price.

ACF and PACF graphs for rice_ts

Step 8: Check if the model starionary using stl(). And to be more sure, perform Augmented Dickey-Fuller test on rice_ts using adf.test().

plot(stl(rice_ts,s.window = "periodic"))
adf.test(rice_ts)

Output: The bar on the right side of the component “seasonal” confirms the seasonality of the time series. Also the p-value of ADF test rejects the possibility of the time series to be stationary.

Step 9: Fit the model using auto.arima() and set seasonal = TRUE . Or manually by tring out Arima() with different order and season. Note that, for tring out manully, we’ll need to choose the model with lowest AIC AICc and BIC value. Use checkresiduals() to confirm the fitted model does not contain lack of fit.

fit3 <- Arima(rice_ts, order = c(1,1,3), seasonal = c(1, 1, 1))
checkresiduals(fit3)

Output: The fitted model rejects lac of fit that if confirmed by Ljung-Box test having a p-value well above 0.05 and also confirms a binomial distribution on residuals.

Check if the model is good enough for forecasting

Step 10: Use forecast() on the fitted model and use autoplot() to viewthe result. Also use accuracy() to check the accuracy of the forecast.

rice_forecast <- forecast(fit3, h = 36)
autoplot(rice_forecast, main = "Rice Price Forecast with ARIMA(1,1,1)(1,1,1)[12]")
accuracy(rice_forecast)

Output:

--

--

Aashiq Reza
Data For Tomorrow

Data Science, ML, Image processing. Good hands in R, MATLAB, Python, SPSS, C/Cpp. Always free to connect : https://www.linkedin.com/in/aashiq-reza-2030b516a/