Time Series Forecasting in R

Published in

--

Time series forecasting with ARIMA model in R with code

Follow all the steps carefully on the data you have to forecast the price of commodities in R.

Step 1: Load all the required libraries

`library(tidyverse)library(lubridate)library(forecast)library(tseries)`

Step 2: Take input of the data and use glimpse() to have a look on it

`pricedata <- read_csv("foodprice.csv")glipmse(pricedata)`

Output:

`Rows: 5,105Columns: 17\$ date      <chr> "15-12-2006", "15-01-2007", "15-02-2007", "15-03-2007", "...\$ cmname    <chr> "Wheat flour - Retail", "Wheat flour - Retail", "Wheat fl...\$ unit      <chr> "KG", "KG", "KG", "KG", "KG", "KG", "KG", "KG", "KG", "KG...\$ category  <chr> "cereals and tubers", "cereals and tubers", "cereals and ...\$ price     <dbl> 23.0, 25.5, 25.5, 26.0, 26.0, 26.5, 26.5, 27.5, 29.5, 30....\$ currency  <chr> "BDT", "BDT", "BDT", "BDT", "BDT", "BDT", "BDT", "BDT", "...\$ country   <chr> "Bangladesh", "Bangladesh", "Bangladesh", "Bangladesh", "...\$ station   <chr> "Barisal", "Barisal", "Barisal", "Barisal", "Barisal", "B...\$ stationID <dbl> 575, 575, 575, 575, 575, 575, 575, 575, 575, 575, 575, 57...\$ mktname   <chr> "Barisal Division", "Barisal Division", "Barisal Division...\$ mktid     <dbl> 112, 112, 112, 112, 112, 112, 112, 112, 112, 112, 112, 11...\$ cmid      <dbl> 58, 58, 58, 58, 58, 58, 58, 58, 58, 58, 58, 58, 58, 58, 5...\$ ptid      <dbl> 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 1...\$ umid      <dbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, ...\$ catid     <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...\$ sn        <chr> "112_58_15_5", "112_58_15_5", "112_58_15_5", "112_58_15_5...\$ default   <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...`

Step 3: Take a look on the “date” variable. We need to convert this into date and time. And then use class() to check the data type of date.

`pricedata <- pricedata %>%  mutate(Date = dmy(date)) %>%  select(-date)class(pricedata\$Date)`

Output:

`> class(pricedata\$Date)[1] "Date"`

Step 4: For this code, We will forecast on price of rice(coarse) only. Filter on “cmname” variable to get the information about rice only and store it in rice_ts. ggplot() is used to plot the result.

`rice_price <- filter(pricedata, cmname == "Rice (coarse) - Retail")ggplot(rice_price, aes(x = Date, y = price, group = station)) + geom_line(alpha = 0.5)`

Output:

Step 5: Because the price differ from station to station, we took the median price of each months and summarize the data, then converted the summarized data into time series using ts().

`summary_rice_price <- rice_price %>% group_by(Date)%>% summarize(median_price = median(price))max_Date = max(summary_rice_price\$Date)min_Date = min(summary_rice_price\$Date)rice_ts <- ts(summary_rice_price\$median_price, end=c(year(max_Date), month(max_Date)),start=c(year(min_Date), month(min_Date)),frequency=12)`

Step 6: Check the ACF and PACF plots to decide whether or not to go with ARIMA model.

`acf <- acf(rice_ts)pacf <- pacf(rice_ts)`

Output: As both of the figures shows a geometric pattern, ARIMA is a suitable model to predict price.

Step 8: Check if the model starionary using stl(). And to be more sure, perform Augmented Dickey-Fuller test on rice_ts using adf.test().

`plot(stl(rice_ts,s.window = "periodic"))adf.test(rice_ts)`

Output: The bar on the right side of the component “seasonal” confirms the seasonality of the time series. Also the p-value of ADF test rejects the possibility of the time series to be stationary.

Step 9: Fit the model using auto.arima() and set seasonal = TRUE . Or manually by tring out Arima() with different order and season. Note that, for tring out manully, we’ll need to choose the model with lowest AIC AICc and BIC value. Use checkresiduals() to confirm the fitted model does not contain lack of fit.

`fit3 <- Arima(rice_ts, order = c(1,1,3), seasonal = c(1, 1, 1))checkresiduals(fit3)`

Output: The fitted model rejects lac of fit that if confirmed by Ljung-Box test having a p-value well above 0.05 and also confirms a binomial distribution on residuals.

Step 10: Use forecast() on the fitted model and use autoplot() to viewthe result. Also use accuracy() to check the accuracy of the forecast.

`rice_forecast <- forecast(fit3, h = 36)autoplot(rice_forecast, main = "Rice Price Forecast with ARIMA(1,1,1)(1,1,1)[12]")accuracy(rice_forecast)`

Output:

This is a simple way to forecast a time series data in R.

Other Stories:

--

--

Data Science, ML, Image processing. Good hands in R, MATLAB, Python, SPSS, C/Cpp. Always free to connect : https://www.linkedin.com/in/aashiq-reza-2030b516a/