COVID-19 in India: Trends and Determinants

Vidushi Gupta
5 min readJul 29, 2020

Since the COVID-19 outbreak in March in India, this Novel virus has been living amongst us. The SARS-CoV-2, which causes COVID-19, pandemic in India is a part of the worldwide pandemic caused due to the novel coronavirus. In this battle to tackle the unprecedented crisis, let us utilize this as an opportunity by analyzing every little aspect of this pandemic which may be lessons for strengthening our preparedness for any future pandemic.

Before and after of Tourist Places due to COVID-19

The analysis presented here aims to present the spatial distribution and spread of the COVID-19 over time in India. In addition, the analysis further attempts to elicit linkages between high COVID-19 prevalence states and their economic contribution to the nation.

Methods and Data Analysis

The analysis was conducted in Python language(version 2.7.16) using the Pandas, Numpy, Matplotlib, Seaborn and Geopandas libraries.

Spatial Trend Analysis

To analyze the trend in the spread of the virus, a state-wise daily cases dataset was imported from From the dataset, the confirmed COVID cases were considered for each state and then a time resampling was performed with regards to ‘M’. Time series resampling was performed on the dataframe to convert it from a daily to a monthly dataframe. Following this it was merged with a shapefile of India and the Matplotlib library was used in order to plot a choropleth map for the month of March, April and May.

Table 1:Merged dataframe with shapefile and time-series resampled data
Choropleth Maps of confirmed COVID cases (March, April, May)

With regards to the scale, the colour towards the shade of purple shows a higher prevalence of COVID (i.e. higher number of confirmed cases) in a particular state. As observed in the trends, the states of Maharashtra, Tamil Nadu and Delhi have been the worst affected cumulatively over the span of three months. The hotspots reflected by the maps highlight the need for a comparison of this distribution with the economic contribution to gain a deeper insight.

Comparison: Prevalence to Economic contribution

For the comparative analysis of the prevalence of COVID-19 and the economic contribution, I took into consideration the RBI dataset which provides the net value added (in Rs. crore) by each state into the national economy for the year 2017–2018.

The net value added is defined as the rupee value for the number of goods and services that have been produced in a country, minus the cost of all inputs or raw materials that are directly attributable to that production and the depreciation involved.

Table 2:RBI dataset for Net state value-added in 2017–18

This dataset was combined with the COVID confirmed cases dataset and a combined bar and line plot was plotted using the seaborn library for the top 3 entries.

Table 3: Dataframe of top 3 states economic contribution and confirmed COVID cases
Plot 1: Comparison Plot for Economic contribution and COVID prevalence

The red colour line shows the confirmed COVID-19 cases in each state with its corresponding scale in red on the right-hand side.

The bar plot is a measure of the economic contribution for each state with its corresponding scale on the left-hand side.

Further More:

Another spatial analysis was conducted for the state of Maharashtra wherein district wise confirmed, active, recovered and the deceased dataset was acquired from

Table 4: Maharashtra district wise dataframe

Using Matplotlib library, a choropleth map was plotted on a shapefile of Maharashtra for all these categories for all the cases reported until mid-June. This analysis was done in order to gain an insight into the district wise spread of this virus.

Choropleth Maps for Maharashtra(district wise)

The following maps make it evident that the virus has been most prevalent in Mumbai and adjoining districts, with the other districts being less affected than the hotspot.


Every analysis has a few limitations and may be explored further.

  1. The data for the state of Ladakh was merged with the data of Jammu and Kashmir due to the absence of the geometry of Ladakh in the shapefile.
  2. The RBI dataset for the Net State Value Added shows the values for the year of 2017–2018. The latest dataset wasn’t considered because of the greater number of missing values.
  3. The data for the district of Palghar in Maharashtra was merged into the Thane district due to the absence of geometry of Palghar in the shapefile.
  4. The greyish extensions in the maps are due to some coordinates missing in the shapefile used in the analysis.

Conclusions and Findings

Through this exploratory data analysis, the following inferences have been derived. the prevalence of COVID has increased in various clusters in states of Maharashtra, Tamil Nadu and Delhi being highly affected.

The greater the economic activities undertaken in a state, the greater the number of COVID-19 positive cases. Higher economic activity also means a higher population and a much more developed area which would mean more testing resulting in higher confirmed cases count.

The qualitative reasoning to the following inference would be that a greater economic activity means a greater inflow and outflow of migrants both domestic and international. Since travelling and contact with infected (asymptomatic or symptomatic) carriers are the major cause of the spread of the virus, thus, it validates the above findings.