COVID Time Series Forecast

gyiernahfufieland
Analytics Vidhya
Published in
5 min readOct 20, 2021

Hi friends. It’s been a while yeah?

I have been really occupied while preparing for various case studies for a couple of companies that I was interviewing for. One interesting case study I did was to propose recommendations for a company based in US and India on when they should be reopening their physical stores.

I would spare you the details on how that turned out, but for that particular case study, I did a time series in forecasting COVID in US and India. Truth is, prior to this case study, I have not had any experience in doing time series. Well, we did touched on this topic during deep learning using LSTM but I wouldn’t count it as an ‘actual’ experience in doing time series.

Anyhoo, I was given a strict timeline to complete several tasks and honestly I don’t think I have done a great job on it. Nevertheless, I am writing this down to record this experience of mine, so that I can better improve it in future :).

As usual, first we started off with the project by importing the necessary libraries:

And some raw data exploration:

There were missing values under the Province/ State column. I disregard it since we are forecasting the numbers in the country level instead.

Here’s the cases in INDIA over time. As you can see in the figure below, the latest data we have here is around May 2021. The number of confirmed cases in India was increasing steeply in April and early May but is showing signs of slowing down during end of May.

Next we have COVID cases in US. And here’s where I find the number of recovered cases seems to be odd. For some reason, the number of recovered cases from Jan 2021 onwards were given as 0. Well that’s pretty impossible right? Tried search elsewhere for the data but wasn’t able to find anything. Since we are going to predict the number of confirmed cases instead, I left it as it is. The number of confirmed cases however shows a rather sigmoidal shape for US.

So in the given dataset, the number of confirmed cases were cumulative. Instead of predicting the cumulative number, I want to predict daily new cases instead. To do so, I de-aggregated the number of confirmed cases using shift differencing.

And from there I further created 2 new columns — New Cases Growth Rate and Infection Fatality Rate.

New Cases Growth Rate = Number of Cases Today / Number of Cases Yesterday

Infection Fatality Rate = Number of Deaths / Number of Confirmed Cases

Both columns were used during the data visualization part for dashboard creation.

As I am using FBProphet model, the model requires columns to be renamed into ds (representing date) and y (representing observations). And then from there, I further split training and testing dataset. In my project, I created models to predict number of confirmed cases/ daily new cases in the next 14 days.

I created separate models in predicting number of cumulative cases, and number of daily new cases for both US and India. I will be showing only one example here, but please feel free to visit my github page for the full source code.

Here’s a model in predicting number of confirmed cases for the next 14 days in US.

You should be able to find what each parameter represents online. Truth is I wasn’t sure what values should I be setting for each parameter. The snip given above was a result after multiple trial and error combination.

Fitting the model into training dataset.
Creating 14 days ahead timeframe for prediction.

So here we have the outcome of the model. Based on the result below we can tell that for the next 14 days, cases in US continues to increase but at a slower pace.

Our results based on several evaluation metrics tell us that this model is doing fine.

FBprophet model by default provides a 80% uncertainty interval for the predicted values. Here’s how it would look like in plot:

As you can see, the actual value, is within the dotted lines, which is our upper and lower uncertainty interval.

And that’s it ! I have covered more than what I have written here on github (more models + cross validation). So just head over there, and please do let me know how can I better improve this !

--

--

gyiernahfufieland
Analytics Vidhya

从我的视野分享我爱的一切。Hey, how are you today?