Data Analysis of Lok Sabha Election 2019 in India

Pinaki Subhra Bhattacharya
Analytics Vidhya

--

using python, Numpy, Panda, Matplotlib, Seaborn

The Lok Sabha, or House of the People, is the lower house of India’s bicameral Parliament, with the upper house being the Rajya Sabha. Members of the Lok Sabha are elected by an adult universal suffrage and a first-past-the-post system to represent their respective constituencies, and they hold their seats for five years or until the body is dissolved by the President on the advice of the council of ministers. The house meets in the Lok Sabha Chambers of the Sansad Bhavan, New Delhi.

The maximum membership of the House allotted by the Constitution of India is 552 (Initially, in 1950, it was 500). Currently, the house has 543 seats which are made up by the election of up to 543 elected members and at a maximum. Between 1952 and 2020, 2 additional members of the Anglo-Indian community were also nominated by the President of India on the advice of the Government of India, which was abolished in January 2020 by the 104th Constitutional Amendment Act, 2019. The Lok Sabha has a seating capacity of 550.

A total of 131 seats (24.03%) are reserved for representatives of Scheduled Castes (84) and Scheduled Tribes (47). The quorum for the House is 10% of the total membership. The Lok Sabha, unless sooner dissolved, continues to operate for five years for time being from the date appointed for its first meeting. However, while a proclamation of emergency is in operation, this period may be extended by Parliament by law or decree.

One of the most critical ways that individuals can influence governmental decision-making is through voting. We know that everyone has the right to vote in our country. But many people are not aware of politics. So by this project, we can learn about the different political parties, their background history as well as their recent success and failure in the Lok Sabha election 2019 in India. Unfortunately, we have found few candidates with criminal history also. So through this data analysis, we can aware of the candidate's history as well as the nature of the political party. We can learn about the winning party and their success in 2019.

We specifically want to mention that it was an unbiased analysis. Here we have not supported any specific party.

Now let’s begin what this project contains.

Dataset:

This Dataset is based on the Lok Sabha 2019 in India. There are a total of 2263 rows and 19 columns in this dataset. By using this dataset this data analysis project is created.

Here we use google COLAB to run these codes and analysis the dataset but you can use other platforms also to run the code.

We have used a total of 3 datasets for this project.

Now let’s begin what this project contains.

Importing Libraries:

import numpy as npimport pandas as pdimport matplotlib.pyplot as pltimport seaborn as sns

Loading the Files:

import iodf2 = pd.read_csv('LS_2.0.csv')

Displaying the Data:

df2.head()
Displaying the data

The shape of the Dataset:

df2.shape
(2263, 19)

Information about all the columns in the Dataset:

df2.info()
Information about all the columns in the dataset

Description of Dataset:

df2.describe()
Description of dataset

Correlation between the Data:

df2.corr()
Correlation between the data

Checking the Null Value in the Dataset:

df2.isnull().values.any()

Crime Cases Count:

Here we are counting the total crimes done by an MP in individual states in India.

df2['criminal'].value_counts()df2['criminal'] = df2['criminal'].replace(['Not Available'],'0')df2['criminal'] = pd.to_numeric(df2['criminal'] , errors='coerce')df2['criminal'].value_counts()df2['criminal'].isna()
Crime cases count

Here we are removing the null values from the column criminal in the dataset.

df2['criminal'].isnull().sum().sum()
245

Here we are displaying the data again.

df2.head()
Displaying the data

Bar Graph of crime Count in different states:

Here we have created a barplot of crime count in different states in India.

#Using Seaborn's CountPlot with figure size 10 * 6plt.figure(figsize=(18,6))sns.countplot(x='criminal',data=df2);
Bargraph of Crime Cases

From the description given below, we can see that the mean of the crime among contestants is 1.45 where for the minimum crime,25% and 50% of contestants did not make any crime but sadly in 75 % of total candidates the crime rate became 1.0. More surprisingly the maximum crime conceived by a person is 240, which’s huge.

Line Graph of State vs Criminal Case:

This graph represents the criminal cases of the candidates present in different states.

import matplotlib.pyplot as plt#fig = plt.figure(figsize =(100, 7))df2.plot(x="STATE", y=["criminal"],figsize =(20, 7), fontsize=10)plt.xlabel("States")plt.ylabel("Criminal Case")plt.title("Distribution of Crimanal Cases")plt.show()
Line graph of State vs Criminal Case

From this graph and the below description, we can see that the maximum no of criminal cases done by a single person is 240.

The Educational Qualification of the Candidates:

#Using Seaborn's CountPlot with figure size 10 * 6plt.figure(figsize=(20,6))sns.countplot(x='EDUCATION',data=df2);
Bargraph of Education vs Count

After analyzing the graph, we can see that there are two columns of class VIII pass and class V pass. But we believe the minimum qualification to be called literate is X pass. So we convert all V pass and VIII candidates as illiterate.

We can see that the number of postgraduate candidates in India is maximum(officially). So this is a positive site from the educational point of view.

Education vs Crime Cases Bargraph:

This graph represents the candidate's educational qualification vs criminal cases they have. Now we are aware of their previous criminal background with their educational qualification.

import seaborn as snssns.set_theme(style="whitegrid")plt.figure(figsize=(20,6))ax = sns.barplot(x="EDUCATION", y="criminal", data=df2)
Bargraph of Education vs Crime Cases

We can analyze from the graph that Graduate and 12th Pass criminal candidates are maximum. Especially we want to mention that a single graduate person has done 240 crimes.

Pie chart of Male vs Female candidates:

This graph represents the male and female candidates who participated in Lok Sabha 2019.

y = np.array([cn1,cn2])mylabels = ["MALE","FEMALE"]plt.pie(y, labels = mylabels, startangle = 90)plt.show()
Pie chart of male vs female candidates

From this pie chart, we can see that the number of male candidates is much greater than the number of female candidates.

State-wise Candidates with Crime Cases:

This is the bar graph of state-wise criminal case contestants and state-wise criminal case winners. The number of candidates with criminal cases is maximum in Bihar, Kerala, Maharashtra, West Bengal, Uttar Pradesh states.

state_criminal = df2.groupby('STATE')[['criminal']].sum().sort_values(by=['criminal']).tail(15).sort_values(by=['STATE'])state_criminal_winner = df2[df2['WINNER']>0].groupby('STATE')[['criminal']].sum().sort_values(by=['criminal']).tail(15).sort_values(by=['STATE'])state_criminal
# 2 Barplot Side by Sidefig, axes = plt.subplots(1, 2, figsize=(20, 8))# Passing X axis and Y axis along with subplot positionsns.barplot(x = state_criminal.index , y = state_criminal['criminal'] , ax=axes[0] , palette='YlOrBr');axes[0].tick_params(axis='x' , rotation=45); #changing the X axis poition to read more clearlyaxes[0].set_title('STATE WISE CRIMINAL CASE OF CONTESTANTS');#We can also change the color of the barplots by giving different palletessns.barplot(x = state_criminal_winner.index , y = state_criminal_winner['criminal'] , ax=axes[1] , palette='viridis');axes[1].set_title('STATE WISE CRIMINAL CASE OF WINNERS');plt.xticks(rotation=45);
State-wise criminal case contestants and winner

Here we can see the crime case across the states of candidates and winners. Here the maximum height of the bar graph is showing in the state of Kerala but West- Bengal, Uttar Pradesh, and Telangana are not far behind.

Bar Graph of category Growth:

Here we calculating the number of SC, ST, and GENERAL candidates in the Lok Sabha election 2019.

consumption = ['SC','ST','GENERAL','OTHERS']growth = [cn1,cn2,cn3,cn4]# Create a pandas dataframedf = pd.DataFrame({"consumption": consumption,"growth": growth})df_sorted_desc= df.sort_values('growth',ascending=False)plt.figure(figsize=(14,10))# make bar plot with matplotlibplt.bar('consumption', 'growth',data=df_sorted_desc,color ='blue',width = 0.4)plt.xlabel("Category", size=15)plt.ylabel("growth", size=15)plt.title("Barplot of Category in the Loksabha Election Candidates", size=18)plt.savefig("bar_plot_matplotlib_Python.png")
Barplot of Category vs Growth

From the graph, we can see that the number of general candidates is maximum in India. The difference between general and other categories is very high.

Bar Graph of Candidate Allocation in Loksabha Election 2019:

Here we are counting the total number of allocation of candidates for different parties in different constituencies in India.

# Initialize dataconsumption = ['BJP','INC','NOTA','IND','BSP']growth = [cn1,cn2,cn3,cn4,cn5]# Create a pandas dataframedf = pd.DataFrame({"consumption": consumption,"growth": growth})df_sorted_desc= df.sort_values('growth',ascending=False)plt.figure(figsize=(14,10))# make bar plot with matplotlibplt.bar('consumption', 'growth',data=df_sorted_desc,color ='orange',width = 0.4)plt.xlabel("party Name", size=15)plt.ylabel("Total Candidates", size=15)plt.title("Barplot of Candidate Allocation in Loksabha Election 2019", size=18)plt.savefig("bar_plot_matplotlib_Python.png")
Barplot of candidate count in different parties

Bar Graph of Party vs Candidates with Crime Case:

Here we are calculating the criminal case candidates in different parties. From that knowledge, we can aware of the criminal cases of the different parties.

fig, axes = plt.subplots(1, 2, figsize=(20, 8))# Passing X axis and Y axis along with subplot positionsns.barplot(x = party_criminal_winner.index , y = party_criminal_winner['criminal'] , ax=axes[0] , palette='icefire');axes[0].tick_params(axis='x' , rotation=45); #changing the X axis poition to read more clearlyaxes[0].set_title('PARTY WISE CRIMINAL CASE OF CONTESTANTS');#We can also change the color of the barplots by giving different palletes
sns.barplot(x = party_winner.index , y = party_winner['criminal'] , ax=axes[1] , palette='viridis');
axes[1].set_title('PARTY WISE CRIMINAL CASE OF WINNERS');plt.xticks(rotation=45);
Barplot of the party-wise criminal case of contestants and winners

From the above diagram, we can see that the BJP and Congress parties have the maximum number of criminal cases in India. This is because that, these two parties are all India-based whereas most of the other parties are regional parties.

The Scatter Plot of States vs Total votes in Loksabha 2019 in India:

fig = plt.figure(figsize =(500, 7))df2.plot(x="STATE", y="TOTAL\nVOTES", kind="scatter")
Scatter plot of state vs total votes

Bar Graph of Age vs Crime Cases:

From this bar graph can know about the criminal cases of the candidates of different age group.

# 2 Barplot Side by Side#fig, axes = plt.subplots(1, 2, figsize=(20, 8))plt.figure(figsize=(14,10))# Passing X axis and Y axis along with subplot positionsns.barplot(x = age_criminal.index , y = age_criminal['criminal'] , palette='icefire');#axes[0].tick_params(axis='x' , rotation=45); #changing the X axis poition to read more clearly#axes[0].set_title('AGE WISE CRIMINAL CASE OF CONTESTANTS');
Barplot of Age vs Crime Cases

From the graph, we can notice that the criminal cases history is maximum at the age of 49,37, and 51.

Bar Graph of State vs Total Votes:

From this bar graph, we can get the knowledge about the no of votes in different states.

# 2 Barplot Side by Side#fig, axes = plt.subplots(1, 2, figsize=(20, 8))plt.figure(figsize=(25,10))# Passing X axis and Y axis along with subplot positionsns.barplot(x = total_voter1.index , y = total_voter1['TOTAL\nVOTES'] , palette='icefire');#axes[0].tick_params(axis='x' , rotation=45); #changing the X axis poition to read more clearly#axes[0].set_title('AGE WISE CRIMINAL CASE OF CONTESTANTS');
Bargraph of state vs total votes

From the barplot, it is transparent that the total number of votes in Maharastra, Uttar Pradesh, and West Bengal are very much higher than the remaining states in India and Uttar Pradesh holds first place in the total number of votes.

Bar Graph of Gender vs Crime:

This is the bar plot of gender vs Crime from which we can know that the number of female candidates is maximum or the number of male candidates is maximum in India.

#fig, axes = plt.subplots(1, 2, figsize=(20, 8))plt.figure(figsize=(9,4))# Passing X axis and Y axis along with subplot positionsns.barplot(x = party_winner1.index , y = party_winner1['criminal']  , palette='icefire');
Barplot of Gender vs Crime

Line Graph of State vs Total EVM Vote & Total Actual Vote:

#EVM Vote Postal Vote Migrant Voteimport matplotlib.pyplot as plt#fig = plt.figure(figsize =(100, 7))hp.plot(x="State Name", y=["EVM Vote","Total Actual Votes"],figsize =(20, 10), fontsize=10)plt.xlabel("STATE")plt.ylabel("Growth of Postal, EVM, Migrant Vote")plt.title("Distribution of Vote Type in Different States")plt.show()
Line graph of state vs total EVM vote & total actual vote

Barplot of Vote Percentage in India:

This the barplot of vote percentage in different states in Lok Sabha 2019.

# 2 Barplot Side by Side#fig, axes = plt.subplots(1, 2, figsize=(20, 8))plt.figure(figsize=(25,10))# Passing X axis and Y axis along with subplot positionplt.xticks(rotation=90)sns.barplot(x = hp['State Name'] , y = hp['ratio'] , palette='icefire');#axes[0].tick_params(axis='x' , rotation=45); #changing the X axis poition to read more clearly#axes[0].set_title('AGE WISE CRIMINAL CASE OF CONTESTANTS');
Barplot of state name vs vote percentage

So this the barplot representation of the state vs vote percentage. We can see from the barplot that, the vote percentage is very good in Kerala, West Bengal, Manipur, Maharashtra, etc. So in India, people are much aware of politics.

Top Winning Candidates List:

From the list given below, we can see the top winning candidates with the highest vote percentage, in the Lok Sabha election 2019 in India.

total_voter2 = hf[hf['Percentage']>0].groupby('Candidate')[['Percentage']].sum().sort_values(by=['Percentage']).tail(15).sort_values(by=['Candidate'])total_voter2

Conclusion:

From the above analysis, we can conclude the following points

  1. From this analysis, we have found that Indians are very much aware of the voting system and political parties. The vote percentage is very good in Kerala, West Bengal, Manipur, Maharashtra, etc.
  2. The BJP candidates have participated more than the other parties in Lok Sabha 2019.
  3. It is transparent that the total number of votes in Maharastra, Uttar Pradesh, and West Bengal are very much higher than the remaining states in India and Uttar Pradesh holds first place in the total number of votes.
  4. The number of female candidates is very less than the number of male candidates.
  5. Unfortunately, we found that the candidates of Kerala have maximum criminal cases history, and West- Bengal, Uttar Pradesh, and Telangana are not far behind. The maximum no of criminal cases done by a single person is 240 and he is from Kerala.
  6. We have noticed that the history of the criminal case of candidates is maximum at the age of 49,37, and 51.

GitHub link for code:

Future Work:

Now I am trying to predict the winner in Lok Sabha 2019 both state-wise and constituency-wise by using machine learning concepts.

--

--