How Climate Change is Causing Farmers in Rural India to Take Their Own Lives.

Hayden Poore
The Startup
Published in
7 min readDec 16, 2019

Since the 1990’s there has been a national catastrophe underway in India. Farmers have been committing suicide at increasingly alarming rates. India itself is an agrarian country with around 70% of its population depending either directly or indirectly upon agriculture. Farm sector suicides in India actually decreased last year but remain at epidemic levels in comparison to the rest of the world and have placed immense pressure on legislators. This epidemic has not been going on unnoticed, many news sources have reported on it and have speculated about the possible reasons for this epidemic.

I decided to take a data driven approach for determining the factors that are causing these farmers to take their own lives.

To begin my analysis I wanted to see how bad this epidemic truly was and wanted to look at the numbers myself. I found a data set on data world that had yearly suicide statistics by profession. After visualizing this data we can see that the overall numbers are increasing. But when I isolated the profession to farming and agriculture activity the numbers show a slight decrease in recent years. I was able to find a more complete data set regarding suicides by those in the agriculture sector here. This data was the same as my previous data but provided statistics dating back to 1995 compared to 2001. Using web scraping and the pandas merge function I created a more robust data frame regarding suicides in the farming/agriculture industry. The fact that suicides by those in the farming and agriculture industry is decreasing is a positive thing but looking at the overall numbers it is still cause for alarm.

Total Suicides in India by Year
Total Suicides of those in the Farming/Agriculture Industry by Year

Knowing that India is an agrarian country and that a large portion of the population relies on agriculture I was curious to see how many people are actually employed in the agriculture industry.

The World Bank collection of development indicators available here presents the most current and accurate global development data available and includes national, regional, and global estimates. I had previously used this data set along with multivariate feature imputation to predict HIV rates in third world countries and published my results on medium. There I covered the steps that I took in cleaning this massive data set. After isolating the cleaned data set to only India I found some interesting statistics.

From the world development indicators data set I found some relevant data regarding the cereal yield production by year. Cereal yield includes wheat, rice, maize, barley, oats, rye, millet, sorghum, buckwheat, and mixed grains measured in kilograms per hectare of harvested land. A quick visualization of this data shows that the yields have been steadily increasing since 1960.

With this dramatic increase in cereal production I figured that it would be appropriate to look at the employment rates in agriculture.

We can see that as cereal yields continue to increase overall employment in agriculture continues to decrease. One possible hypothesis for this is the increasing commercialization of the agriculture industry and commercial owned farms are replacing those owned by individual farmers themselves.

In order to better illustrate the correlation I normalized the numbers from each data set so that they could be visualized together.

This graph shows an inverse relationship between cereal yields and percentage of those employed in the agriculture industry.

But what if we wanted to see where these numbers would be in let’s say 10 years from now. That’s where Facebook’s Prophet comes into play. Prophet is a procedure for forecasting time series data based on an additive model where nonlinear trends are fit with weekly, yearly, and daily seasonality. My yield and employment data are based on a yearly seasonality so it was relatively easy to generate a model and build a 10 year forecast for each variable.

10 Year Forecast of Cereal Yield per Hectare (kg)
10 Year Forecast of Percentage of Population Employed in the Farming/Agriculture Industry

The conclusion that can be derived from the Prophet predictions is that cereal yields will continue to grow despite a decrease in the percentage of those employed in the agriculture industry.

Using BeautifulSoup4 I was able to scrape data from a Wikipedia article outlining the reasons behind these farmers taking their own lives. These statistics are from 2002 but it provides a good foundation for diving deeper into an analysis. With failure of crops being the number one reason I decided to look at factors that would cause these farmers crops to fail.

Reasons given by close relatives and friends

In order to understand the reasons behind the failure of crops for some farmers I decided to look at some environmental statistics that could attribute to the failure of a harvest.

From my world development indicators data set I found statistics regarding C02 emissions per capita by year. In order to understand the trends in this data I created a simple time series graph.

India’s C02 emissions have seen an almost exponential increase since 1960. This could may be in part to increasing cereal yields.

While C02 emissions may not be the biggest factor in whether a harvest will succeed or not climate plays a significant role.

While the C02 emissions are trending upwards I was still curious to see the forecasts that prophet would create.

10 Year Forecast of C02 Emissions per Capita in India

I found a data set with historical average temperatures in India from India’s open government data platform. This data was already very nicely cleaned upon downloading so visualizing it was easy.

We can see that the average annual climate in India is increasing. Since the climate data doesn’t have a great degree of linearity the Prophet predictions created are difficult to understand.

Another factor that plays a role in determining the success or failure of a harvest is rainfall. The Indian open government data platform provided me with a robust rainfall data set.

The rainfall visualization is difficult to derive any conclusions from and in fact at this point no conclusions can be made from any of my visualizations. I was still wondering at what was causing crop failures and in turn causing farmers to take their own lives. I realized that an appropriate way to show the relationships between these variables would be to calculate Pearson correlation coefficients.

The suicide data that I collected only spanned from 1995–2012 so in order to calculate a correlation coefficient I had to trim my data down to those years.

Combined data into one data frame

My next step was to call the pandas function .corr() which computes pairwise correlation of columns, excluding NA/null values.

Pearson Correlation data frame
Correlation Visualization

I used to seaborn to create a heatmap of the correlation data frame in order better illustrate the results.

The first point that can be made is that the relationship between Farmer Suicides and Annual Rainfall is -0.38. This is a negative correlation meaning that as rainfall decreases and harvests are more likely to fail due to the lack of water farmers are more likely to commit suicide.

The second strongest correlation comes from annual rainfall and C02 emissions. As C02 emissions rise the tendency is that the average annual rainfall will decrease. With average annual rainfall decreasing the chance of a harvest failing is higher.

One reason that I believe the correlations are not as high as expected is that I only had statistics regarding farmer suicides dating back to 1995 compared to the rest of my data dating back to 1900.

The conclusion that I am drawing from these two points is that increased C02 emissions causes a decrease in rainfall and also affects temperature. These two factors have a relationship with farmer suicides.

Tying this all back to the first analysis that I made about employment in agriculture is that with these factors resulting in inconsistent harvests, small farmers can’t afford to have a bad season while large commercial farms can afford one or two bad seasons thus forcing small farmers out of the agriculture industry.

To be sure though a more complete an in-depth analysis would require historical statistics regarding suicides in the farming and agriculture industry to better illustrate the relationship between the factors analyzed and suicides. My correlation visualization does not show a significant relationship between the change in climate and farmer suicides. Also a data set that is strictly small farm cereal yields would help with my analysis. I decided to look at C02 emissions because when the term “climate change” is used it is usually in regard to human activities and C02 emissions represents human activity.

In conclusion the increase of human activity in the form of C02 emission has effects on rainfall and climate that in turn cause crops to fail. These crops failing result in an increase in farmer suicides.

--

--

Hayden Poore
The Startup

Information and Data Science Student with a minor in Business Analytics at the University of Colorado Boulder. Currently seeking a Summer 19 Internship.