Data Analysis of State Wise GDP in India and Recommendations Using Python
The overall goal of this project is to help the ministers focus on areas that will foster economic development for their respective states. Since the most common measure of economic development is the GDP, we will analyse the GDP of the various states of India and suggest ways to improve it
Gross domestic product (GDP) at current prices is the GDP at the market value of goods and services produced in a country during a year. In other words, GDP measures the ‘monetary value of final goods and services produced by a country/state in a given period of time’.
GDP can be broadly divided into goods and services produced by three sectors: the primary sector (agriculture), the secondary sector (industry), and the tertiary sector (services).
It is also known as nominal GDP. More technically, (real) GDP takes into account the price change that may have occurred due to inflation. This means that the real GDP is nominal GDP adjusted for inflation. We will use the nominal GDP for this exercise. Also, we will consider the financial year 2015–16 as the base year, as most of the data required for this exercise is available for the aforementioned period.
Per Capita GDP and Income
Total GDP divided by the population gives the per capita GDP, which roughly measures the average value of goods and services produced per person. The per capita income is closely related to the per capita GDP (though they are not the same). In general, the per capita income increases when the per capita GDP increases, and vice-versa. For instance, in the financial year 2015–16, the per capita income of India was ₹93,293, whereas the per capita GDP of India was $1717, which roughly amounts to ₹1,11,605.
We will divide the analysis into two parts
Part-I: GDP Analysis of the Indian States
We will use data available for years 2012–2017
Part-II: GDP and Education Dropout Rates
We will use data for year 2014–2015
Data I download:
- Go to the URL: https://data.gov.in/ and search for the keyword ‘State-wise Gross Domestic Product (GDP) at current price on yearly basis’. Select the ‘State-wise Gross Domestic Product (GDP) at current price on yearly basis’ from the search result and download the data.
- Go to the URL: https://data.gov.in/ and search for the keyword ‘GSVA by Economic Activity at Current Prices’. Click on “More Similar Results”. Download the data for all the states, not the union territories
Data II download:
Download the dropout rates data from the link below:
Import all the necessary libraries in python for data analysis
Check for Data Type and Column names
Check for missing values
Lets have a look at data set
Lets check for missing values duration wise
Lets check for missing values state wise
Lets now create two different data frames
- DataFrame for GSDP for different years for all states
- DataFrame for Growth Percentage over different years for all states
Also create data frame of All India GDP
This data frame will have two filtered columns from growth_df — ‘Duration’ and ‘All India GDP’
Lets analyse % Growth over previous years using Best Fit Line
For this purpose we will create a function that will take data frames — growth_df, nation_df, ‘Duration’ column as target and the columns list (This columns list is the filtered list of columns created columns of growth_df. This list will have names of columns growth_df except ‘Duration’ and ‘All_India GDP’
The function will return the data frame with State and slope value of using polyfit function of numpy. We will use this slope value to determine the growth % of that state over the previous years. Greater the slope, greater the continuous growth over the years. Negative value means negative growth
Lets now use this function to find the growth % of all states over previous years
Lets now analyse the GDP per capita for all the states.
- Identify the top 5 and the bottom 5 states based on the GDP per capita.
- Find the ratio of the highest per capita GDP to the lowest per capita GDP.
For this we will load the GSVA data of all states. These are around 31 csv files stored in a folder. We will read these files from the folder. These files hold information of various sectors and sub sectors contribution towards the GDP of sate. For our analysis we will focus on sectors contribution(which is the total sum of all sub sectors contribution). So we will drop the redundant rows of sub sector contribution
The data file is like this
The data frame looks like
There are no missing values in this data frame. Also we will drop the Union Territory data from this data frame
Lets plot GDP per Capita for all States
Find the ratio of the highest per capita GDP to the lowest per capita GDP
Lets plot Percentage Contribution of Primary, Secondary and Tertiary sectors in GDP of State
Lets Categorise states into four groups based on the GDP per capita (C1, C2, C3, C4)
Categorise the states into four groups based on the GDP per capita (C1, C2, C3, C4, where C1 would have the highest per capita GDP and C4, the lowest). The quantile values are (0.20,0.5, 0.85, 1), i.e., the states lying between the 85th and the 100th percentile are in C1; those between the 50th and the 85th percentiles are in C2, and so on.
Now For each category (C1, C2, C3, C4) lets do following:
- Find the top 3/4/5 sub-sectors (such as agriculture, forestry and fishing, crops, manufacturing etc., not primary, secondary and tertiary) that contribute to approximately 80% of the GSDP of each category.
- Note-I: The nomenclature for this project is as follows: primary, secondary and tertiary are named ‘sectors’, while agriculture, manufacturing etc. are named ‘sub-sectors’.
- Note-II: If the top 3 sub-sectors contribute to, say, 79% of the GDP of some category, you can report “These top 3 sub-sectors contribute to approximately 80% of the GDP”. This is to simplify the analysis and make the results consumable
For this purpose we will create a Function to plot Sub Sectors that contribute to 80% of GSDP belonging to different category
Lets plot the sectors contribution
Lets find Correlation of sectors with GDP of states
Lets plot sectors contribution towards GDP of states for each category
For this we will create a function that will have all sectors and the category as columns. We will use df_category_state data frame for this purpose. We will then transpose the data frame and will plot the bar graph to show sectors contribution towards GDP of all states wrt each category
Lets plot now
Part-II: GDP and Education Dropout Rates
We will load the data set for education drop out rates and will check for data issues
Final data set with drop out rate details for all States
Lets Check correlation of GDP per capita with dropout rates in education (primary, upper primary and secondary) for the year 2014–2015 for each state
Lets check Correlation between dropout rate and %contribution of each sector (Primary, Secondary and Tertiary) to the total GDP
We will create a function for this purpose
Lets find the Correlation between education and dropout rate
Hypothesis from above data of dropout rate and Sector’s %contribution and population
- Increase in dropout rate negatively affects Secondary sector and thereby GDP of states with major %contribution from secondary sector
- If the population increases, Secondary dropout rate also increases
- Per Capita decreases with increase in dropout rate
- Government must bring in measures to control dropout rate which will help in increase in GDP of states
Contribute to vidhujain/Projects_Showcase development by creating an account on GitHub.
Vidhu Jain - International Institute of Information Technology Bangalore - Singapore | LinkedIn
Data Scientist with experience in data analysis, building and deploying models to solve industry problems using data.