Predicting Number of COVID-19 Cases in India

Abinash Chakraborty
5 min readApr 15, 2020

The first confirmed case of COVID-19 in India was reported on 30-Jan-2020 in the state of Kerela. The number of confirmed cases in India was low till 15-Mar-2020, presumably because patients with COVID-19-like symptoms were not being tested.

Predicting the Total Number of Confirmed Cases in India

As can be seen in the graph above, since 15-Mar-2020, there is a seeming exponential rise in the number of COVID-19 cases in India.

To predict the Total Number of Confirmed Cases, I’ve used a linear regression model to fit the data.

It is assumed that Total Number of Confirmed COVID-19 Cases in India (y) is a function of Number of Days (x) since SARS-COV2 first entered India, modelled as —

where a and lambda are constants to be determined.

The natural way to fit this would be to use Linear Regression for Logarithms of y and x. However, in doing so, only the dataset post 30-Mar-2020 has been used. This is for two reasons —

  1. India started testing more aggressively post-30th March
  2. Tablighi Jamaat Cluster (discovered around the 30-Mar) led to huge spike in the number of confirmed cases.

Based on this fitting, the following prediction was made for the Total Number of Confirmed COVID-19 Cases in India on 14-Apr-2020 —

As of 15-Apr-2020 5 PM, the Total Confirmed Cases in India as per the Ministry of Health is 11933. That’s an error <0.3% against my prediction.

Predicting Total Number of Confirmed Cases in the state of Maharashtra

Among the States/Union Territories of India, the State of Maharashtra is worst affected.

I’ve fit the number of COVID-19 cases in Maharashtra using the same technique as the above section. Last 17-days of data have been used to train the model.

Based on the above model, the predictions for the total number of COVID-19 cases in Maharashtra are the following —

As of 15-Apr-2020 5 PM, the Total Confirmed Cases in Maharashtra as per the Ministry of Health is 2687. That’s an error <0.1% against my prediction.

Conclusions

Before publishing this, I have done similar predictions for the period of 11-Apr-2020 and 14-Apr-2020 for Total Confirmed cases in India.

A comparison table for predicted values and actual values (including 15-Apr data that’s available at the time of Publishing)
  1. It can be said that the growth in India is following an exponential curve.
  2. Things are not getting worse i.e. the slope of the linear fitted line is NOT increasing. Neither it is getting better.
  3. India is in Lockdown (Legally compelling Stay-At-Home orders) from 23-Mar-2020 till 03-May-2020. The curve is expected to flatten i.e. the actual confirmed cases would be less than my predicted values. That will be an indication that the Lockdown is having flattening effect.
  4. This exponential growth, despite India being in Lockdown since 23-Mar-2020, has been because of progressive increase in India’s testing capacities. As of 15-Apr-2020, the testing capabilities have somewhat stabilised.
  5. It is surprising that despite many variables, for the last 5 days, a linear model is able to predict the number of confirmed cases in India, with appreciable accuracy

Data used for the analysis is from Ministry of Health and Family Welfare, Government of India

The Ministry updates the data twice in a 24-hr period — Once around 8 AM and again at around 5 PM. For a calendar day, the last data updated by the Ministry for that day is taken as the Total Confirmed Cases for the day. An assorted CSV file is available in this Kaggle page

The Linear Modelling has been done using LinearRegression from sklearn.linear_model in Python

[Updated on 18-Apr-2020] Accuracy of Predictions

Comparison between Estimates and Actual Cases inIndia
Comparison between Estimates and Actual Cases in Maharashtra

The number of cases in India and Maharashtra (the worst affected State of India) has been rising but at a slower rate than the trend on 14-Apr.

As expected in Conclusion #3, the actual number of cases was less than the estimated number of cases. The curve has been flattening i.e. the slope of the fitted line in logarithmic scale has been reducing.

Summary

In the last 4 days, for India, COVID-19 spread has decreased by 11.40% and for Maharashtra by 15.70%.

India and its States are expected to loosen the Lockdown restrictions from 20-Apr. It will be interesting to see how the rate of spread of cases is affected post 20-Apr-2020.

--

--