Data Analysis of State Wise GDP in India and Recommendations Using Python

Vidhu
Vidhu
Jul 24, 2020 · 9 min read

The overall goal of this project is to help the ministers focus on areas that will foster economic development for their respective states. Since the most common measure of economic development is the GDP, we will analyse the GDP of the various states of India and suggest ways to improve it

Understanding GDP

Gross domestic product (GDP) at current prices is the GDP at the market value of goods and services produced in a country during a year. In other words, GDP measures the ‘monetary value of final goods and services produced by a country/state in a given period of time’.

GDP can be broadly divided into goods and services produced by three sectors: the primary sector (agriculture), the secondary sector (industry), and the tertiary sector (services).

It is also known as nominal GDP. More technically, (real) GDP takes into account the price change that may have occurred due to inflation. This means that the real GDP is nominal GDP adjusted for inflation. We will use the nominal GDP for this exercise. Also, we will consider the financial year 2015–16 as the base year, as most of the data required for this exercise is available for the aforementioned period.

Per Capita GDP and Income

Total GDP divided by the population gives the per capita GDP, which roughly measures the average value of goods and services produced per person. The per capita income is closely related to the per capita GDP (though they are not the same). In general, the per capita income increases when the per capita GDP increases, and vice-versa. For instance, in the financial year 2015–16, the per capita income of India was ₹93,293, whereas the per capita GDP of India was $1717, which roughly amounts to ₹1,11,605.

We will divide the analysis into two parts

Part-I: GDP Analysis of the Indian States

We will use data available for years 2012–2017

Part-II: GDP and Education Dropout Rates

We will use data for year 2014–2015

Dataset

Data I download:

  1. Go to the URL: https://data.gov.in/ and search for the keyword ‘State-wise Gross Domestic Product (GDP) at current price on yearly basis’. Select the ‘State-wise Gross Domestic Product (GDP) at current price on yearly basis’ from the search result and download the data.
  2. Go to the URL: https://data.gov.in/ and search for the keyword ‘GSVA by Economic Activity at Current Prices’. Click on “More Similar Results”. Download the data for all the states, not the union territories

Data II download:

Download the dropout rates data from the link below:

https://data.gov.in/resources/state-ut-wise-average-annual-drop-out-rate-2012-13-2014-15-ministry-human-resource

Lets Begin…

Library Imports

Import all the necessary libraries in python for data analysis

Load Files

Check for Data Type and Column names

Check for missing values

Lets have a look at data set

Lets check for missing values duration wise

Lets check for missing values state wise

Lets now create two different data frames

  1. DataFrame for GSDP for different years for all states
  2. DataFrame for Growth Percentage over different years for all states

Also create data frame of All India GDP

This data frame will have two filtered columns from growth_df — ‘Duration’ and ‘All India GDP’

Lets analyse % Growth over previous years using Best Fit Line

For this purpose we will create a function that will take data frames — growth_df, nation_df, ‘Duration’ column as target and the columns list (This columns list is the filtered list of columns created columns of growth_df. This list will have names of columns growth_df except ‘Duration’ and ‘All_India GDP’

The function will return the data frame with State and slope value of using polyfit function of numpy. We will use this slope value to determine the growth % of that state over the previous years. Greater the slope, greater the continuous growth over the years. Negative value means negative growth

Lets now use this function to find the growth % of all states over previous years

Lets now analyse the GDP per capita for all the states.

  • Identify the top 5 and the bottom 5 states based on the GDP per capita.
  • Find the ratio of the highest per capita GDP to the lowest per capita GDP.

For this we will load the GSVA data of all states. These are around 31 csv files stored in a folder. We will read these files from the folder. These files hold information of various sectors and sub sectors contribution towards the GDP of sate. For our analysis we will focus on sectors contribution(which is the total sum of all sub sectors contribution). So we will drop the redundant rows of sub sector contribution

The data file is like this

The data frame looks like

There are no missing values in this data frame. Also we will drop the Union Territory data from this data frame

Lets plot GDP per Capita for all States

Find the ratio of the highest per capita GDP to the lowest per capita GDP

Lets plot Percentage Contribution of Primary, Secondary and Tertiary sectors in GDP of State

Lets Categorise states into four groups based on the GDP per capita (C1, C2, C3, C4)

Categorise the states into four groups based on the GDP per capita (C1, C2, C3, C4, where C1 would have the highest per capita GDP and C4, the lowest). The quantile values are (0.20,0.5, 0.85, 1), i.e., the states lying between the 85th and the 100th percentile are in C1; those between the 50th and the 85th percentiles are in C2, and so on.

Now For each category (C1, C2, C3, C4) lets do following:

  • Find the top 3/4/5 sub-sectors (such as agriculture, forestry and fishing, crops, manufacturing etc., not primary, secondary and tertiary) that contribute to approximately 80% of the GSDP of each category.
  • Note-I: The nomenclature for this project is as follows: primary, secondary and tertiary are named ‘sectors’, while agriculture, manufacturing etc. are named ‘sub-sectors’.
  • Note-II: If the top 3 sub-sectors contribute to, say, 79% of the GDP of some category, you can report “These top 3 sub-sectors contribute to approximately 80% of the GDP”. This is to simplify the analysis and make the results consumable

For this purpose we will create a Function to plot Sub Sectors that contribute to 80% of GSDP belonging to different category

Lets plot the sectors contribution

Lets find Correlation of sectors with GDP of states

Lets plot sectors contribution towards GDP of states for each category

For this we will create a function that will have all sectors and the category as columns. We will use df_category_state data frame for this purpose. We will then transpose the data frame and will plot the bar graph to show sectors contribution towards GDP of all states wrt each category

Lets plot now

Part-II: GDP and Education Dropout Rates

We will load the data set for education drop out rates and will check for data issues

Final data set with drop out rate details for all States

Lets Check correlation of GDP per capita with dropout rates in education (primary, upper primary and secondary) for the year 2014–2015 for each state

Lets check Correlation between dropout rate and %contribution of each sector (Primary, Secondary and Tertiary) to the total GDP

We will create a function for this purpose

Lets find the Correlation between education and dropout rate

Hypothesis from above data of dropout rate and Sector’s %contribution and population

  • Increase in dropout rate negatively affects Secondary sector and thereby GDP of states with major %contribution from secondary sector
  • If the population increases, Secondary dropout rate also increases
  • Per Capita decreases with increase in dropout rate
  • Government must bring in measures to control dropout rate which will help in increase in GDP of states

Github Link

Linkedin Profile

Follow Other posts

The Startup

Get smarter at building your thing. Join The Startup’s +800K followers.