# Forecasting Daily New Confirmed COVID-19 Cases in Maldives — Part 2

Predicting Daily New Cases using ARIMA Models.

This

secondarticle will explain how to forecast daily new COVID-19 confirmed cases in Maldives. Check out my first article about forecasting using Simple Exponential Smoothing, Linear Trend Model, and Holt-Winters Smoothing.here

# Introduction

The Box-Jenkins technique is a collection of processes for finding and estimating time series models related to the AutoRegressive Integrated Moving Average (**ARIMA**). ARIMA models strongly rely on the data’s autocorrelation pattern. According to Jie (2021), **three items** are required to determine the appropriate ARIMA models: **the time series, ACF plot, and PACF plot**. After analyzing the ACF and PACF plots and tested using the unit root test, the next step is to determine whether the time series needs to be **differencing **or not. Then, **diagnostic checking** is carried out to determine whether the model that has been made is adequate or not. The criteria for diagnostic checking are the z-test for coefficient significance, residual analysis, and model selection criteria based on forecast error. Diagnostic checking will be carried out using Ljung-Box tests, and if the p-value tests > 0.05, it can be concluded that the model is adequate and can be used for forecasting.

# Autocorrelation Analysis

## Before Differencing

As mentioned in the previous article , the ACF plot is **dying down exceptionally slowly**, and autocorrelation **remains significant for several lags**, which indicates that the series is **not stationary**. In addition, the **PACF plot dies down exponentially with oscillation**.

In addition, KPSS tests have been carried out for time series that have not been differencing. From the results of the KPSS test, it can be concluded that **differencing is needed** to change the series from non-stationary to stationary.

## First Differencing

In the ACF plot, it can be seen that ACF experienced sine waves dying down and being cut-off at lag one. At the same time, the PACF plot experienced a cut-off at lag two (dies down with oscillation).

**> Kwiatkowski–Phillips–Schmidt–Shin test on First Differencing**

*Hypothesis:

• H0: The series isstationary.

• H1: The series isnot stationary.

*Criteria:

• If the p-value is < 0.05,reject H0.

• If the value of the test-statistic is greater than the critical value,reject H0.

The **p-value is more significant than 0.1**, which is bigger than 0.05. In addition, the test-statistical value is **less significant** than the critical value (0.068 smaller than 0.463). From these results, it can be concluded that now **the series is stationary** (H0 rejected).

# Building ARIMA Model

Based on the results of the autocorrelation analysis in the previous section, two ARIMA models will be proposed in this section. From the results of the ACF and PACF plots after first differencing, there is a cut-off on lag one for the ACF plot (**MA( 1)**) and a cut-off on lag two on the PACF plot (

**AR(**). Therefore, the ARIMA model that will be proposed is

*2*)**ARIMA (**and uses the

*2, 1, 1*)**‘auto.arima’**function to generate optimal

*p*,

*d*, and

*q*values from time-series data.

Using the **‘auto.arima’** function, the optimal *p*, *d*, and *q* are **ARIMA ( 2, 1, 2)**.

## Summary of ARIMA Models

From these two figures, it can be concluded that **the model generated by the auto Arima function has better AIC, AICc, and BIC values** compared to ARIMA (*2, 1, 1*).

.: The lower the AIC, AICc, and BIC values, the better ARIMA model.

## Adequacy of Each ARIMA Models

In order to test the adequacy of Arima models, the **Ljung-Box test** will be used. The following is the hypothesis for the Ljung-Box test.

**> Ljung-Box test**

*Hypothesis:

• H0: Errors areindependent(model isadequate).

• H1: Errors arenot independent(model isnot adequate).

*Criteria:

• If the p-value is < 0.05,reject H0.

•The histogram of residuals is anormal distribution.

•No trend and seasonalityin residuals plot.

•No correlation between residualsin the ACF plot.

Based on the behaviour in residual plot for ARIMA (2, 1, 1), it can be seen that there is **no trend and seasonality** in the plot (mean value is 0). In addition, there are **four significant spikes** in the ACF plot for the residuals, where there is still a **slight correlation between residuals** in the ACF plot. Moreover, the histogram of the residuals follows a **normal distribution**. In the Ljung-Box test, it can be seen that the p-value (0.025) < 0.05. This implies that **the residuals are not following the white noise process** (the model is **not adequate**). In conclusion, **H0 rejected**.

Based on the behaviour in residual plot for auto ARIMA, it can be seen that there is **no trend and seasonality** in the plot (mean value is 0). In addition, there are **two significant spikes** in the ACF plot for the residuals, where it is still **very little/almost no correlation** between residuals in the ACF plot. Moreover, the histogram of the residuals follows a **normal distribution**. In the Ljung-Box test, it can be seen that the p-value (0.5209) > 0.05. This implies that the **residuals follow the white noise process** (the model is **adequate**). In conclusion, **H0 accepted**.

## Significance of Parameter Coefficients

In the ARIMA (*2,1,1*), the **coefficients of ar1 and ar2 are significant** because the p-value of the z-test is less than 0.05. Meanwhile, the **coefficient of ma1 is not significant** because the p-value of the z-test is more than 0.05. It is suggested to **revise the model or propose a new ARIMA model**.

In the auto Arima model, the **coefficients of ar1, ar2, and ma2 are significant** because the p-value of the z-test is less than 0.05, While the **coefficient of ma1 is not significant** because the p-value of the z-test is more than 0.05. Since the ma1 is not significant, but the ma2 is significant, it is advised to **revise the model by including ma1 and ma2**.

## Forecast Errors

RMSE value and mean error (ME) generated by the **auto ARIMA model has a lower value** than ARIMA (*2,1,1*). However, the MAE value produced by the auto ARIMA model is slightly higher than the ARIMA (*2,1,1*). From these results, it can be concluded that **auto ARIMA with ARIMA ( 2,1,2) has the best performance** compared to ARIMA ARIMA (

*2,1,1*).

## Conclusions

Based on the summary, adequacy, and significance parameter coefficients of the two ARIMA models, **it can be concluded that the auto Arima model with ARIMA ( 2,1,2) is the best** compared to ARIMA (

*2,1,1*). In the next section, forecasting will be carried out using ARIMA (

*2,1,2*).

# Forecast using Best ARIMA Model

It can be seen that the ARIMA (*2,1,2*) model can have good significance in training and forecast results.

This

secondarticle will explain how to forecast daily new COVID-19 confirmed cases in Maldives. Check out my first article about forecasting using Simple Exponential Smoothing, Linear Trend Model, and Holt-Winters Smoothing.here

All codes are available here

# References

- Akhilendra. (2019).
*Evaluation Metrics for Regression models- MAE Vs MSE Vs RMSE vs RMSLE*. https://akhilendra.com/evaluation-metrics-regression-mae-mse-rmse-rmsle/ - Ariton, L. (2021).
*A Thorough Introduction to Holt-Winters Forecasting*. Medium. https://medium.com/analytics-vidhya/a-thorough-introduction-to-holt-winters-forecasting-c21810b8c0e6 - Choubey, V. (2020).
*How to evaluate the performance of a machine learning model*. Medium. https://vijay-choubey.medium.com/how-to-evaluate-the-performance-of-a-machine-learning-model-d12ce920c365 - Date, S. (2021).
*Holt-Winters Exponential Smoothing*. https://timeseriesreasoning.com/contents/holt-winters-exponential-smoothing/ - Glen, S. (2016).
*Correlogram / Auto Correlation Function ACF Plot: Definition in Plain English*. StatisticsHowTo.Com. https://www.statisticshowto.com/correlogram/ - Glen, S. (2021).
*Mean Absolute Percentage Error (MAPE)*. StatisticsHowTo.Com. https://www.statisticshowto.com/mean-absolute-percentage-error-mape/ - Jie, T. (2021).
*An Overview of Time Series Forecasting with ARIMA Models*. Towards Data Science. https://towardsdatascience.com/time-series-analysis-arima-based-models-541de9c7b4db - Lee, M. (2021).
*What’s The Difference Between Autocorrelation & Partial Autocorrelation For Time Series Analysis?*Medium. https://mxplus3.medium.com/interpreting-autocorrelation-partial-autocorrelation-plots-for-time-series-analysis-23f87b102c64 - Marksei. (2020).
*Machine Learning 101: Evaluating regression models, MAE, MSE, RMSE, R-squared explained*. https://www.marksei.com/machine-learning-101-evaluating-regression-models-error-metrics/ - Moody, J. (2019).
*What does RMSE really mean?*Towards Data Science. https://towardsdatascience.com/what-does-rmse-really-mean-806b65f2e48e - Our World in Data. (2021).
*Daily new confirmed COVID-19 cases per million people*. https://ourworldindata.org/explorers/coronavirus-data-explorer - Smigel, L. (2021).
*What Is Stationarity in Time Series Analysis? A Visual Guide*. Analyzing Alpha. https://analyzingalpha.com/stationarity - SolarWinds. (2019).
*Holt-Winters Forecasting and Exponential Smoothing Simplified*. Orange Matter. https://orangematter.solarwinds.com/2019/12/15/holt-winters-forecasting-simplified/ - Tyagi, N. (2021).
*A Tutorial on Exponential Smoothing and its Types*. Analytics Steps. https://www.analyticssteps.com/blogs/tutorial-exponential-smoothing-and-its-types - Ullah, M. I. (2020).
*Components of Time Series*. Itfeature.Com. https://itfeature.com/time-series-analysis-and-forecasting/components-of-time-series - Verma, Y. (2021).
*Complete Guide To Dickey-Fuller Test In Time-Series Analysis*. Analytics India Magazine. https://analyticsindiamag.com/complete-guide-to-dickey-fuller-test-in-time-series-analysis/ - Zaiontz, C. (2021).
*Mann-Kendall Test*. Real Statistics. https://www.real-statistics.com/time-series-analysis/time-series-miscellaneous/mann-kendall-test/