Data Science Applications in Agriculture

Sandeep Kumar Kotha
Indo Data Week
Published in
4 min readDec 20, 2020

These applications have come up by Team FoodforGood as part of Hackathon FOR Good, a hackathon organized by Indo Data Week.

Agriculture plays a critical role in the global economy. Here, we have come up with two use cases that can be used to tackle agriculture issues.

The FoodforGood team members are Kotha Sandeep Kumar and Prem Kumar Kommu.

The Challenge:

Crop Yield Prediction is the methodology to predict the yield of the crops using different parameters like rainfall, temperature, fertilizers, pesticides, and atmospheric conditions.
With the continuing expansion of the human population understanding, worldwide crop yield is central to addressing food security challenges and reducing the impacts of climate change.

Crop yield prediction is an important agricultural problem. The Agricultural yield primarily depends on weather conditions (rain, temperature, etc), pesticides.
Accurate information about the history of crop yield is important for making decisions related to agricultural risk management and future predictions.

Teams Challenge:

The objective of this use case is to predict the crop yield based on weather conditions (rain, temperature, etc)and pesticides.

Steps involved in the Methodology:

Data Collection:
This analysis is based on test data created only for demonstration purposes. The sample size considered in this analysis is 1000 records without any missing records.
The parameters considered in this process Average temperature, tonnes of pesticides, average rainfall.
The target variable is a yield produced from a crop “paddy”.

Analysis Approach:

The objective is to predict the crop “Paddy” yield based on the number of parameters.

We used the standardized technique called “Minmax” scaler to make the data normalized way.

The whole dataset is split into 80% for training the model and 20% is for testing the model.

We used 80% data for training the model using different machine learning algorithms like “GradientBoostingRegressor”, “SVM”,” RandomforestRegressor” and “DecisionTreeRegressor”.

The metric R_Square has been used to finalize the better model among the multiple models.

The model has been validated for test data (20%) and compared the metric R-Square for both trained and test datasets.

Below is the output from the analysis.

Model metrics for trained data
Metrics for test data

Note: For demonstration purposes, we developed a few models only. We can also use Deep learning models like ANN, LSTM and etc.,

Conclusion:
We can develop an AI model in a full fledge way for the actual dataset with different cities in Telangana. This analysis can help the farmers identify the crop losses and prevent them in the future and also each and every farmer can know, how much yield will get.

Use case2:

Forecasting Air Quality Index

The Challenge:
Air quality has become one of the issues in a few of the places in India that can affect humans and the environment as well which may lead to huge loss to everyone.
To tackle the issue, we have come up with this Air Quality analysis in one of the Indian state of Telangana.

Teams Challenge:
The objective of this use case is to forecast the air quality index on an hourly basis beforehand and send an alert to the Government so that they can take precautions to avoid the loss.

Steps involved in the Methodology:
Data Collection:
Data collection is the process of gathering and measuring information on variables of interest. This analysis is based on sample data only for demonstration purposes and considered the Air quality index data from five cities of Telangana state. The data considered from the Ambee API.
The parameters considered in this process are CO, SO2, PM10, PM25, AQI, NO2, and ozone. The sample size considered in this analysis is 5000+ records without any missing records. The data has been captured based on an hourly basis.

Analysis Approach:
Since the objective is to forecast the future values of AQI parameters. We have considered only one technique called “VAR”- “Vector Autoregression”.
Considering the above seven parameters as input and forecasted them for the future time periods using the VAR method.

Below is the sample output based on the VAR method.

Forecasted values for two time periods

Note: For demonstration purpose, we have used only technique but yes there are many techniques like ARIMA, SARIMA and so on can be used to tackle this problem.

Conclusion:
Once we develop a forecasting model in a full fledge way for the entire dataset for different cities in Telangana. we can send an alert Government to take precautions so that no loss can happen to anyone.

--

--