Optimal Crop Recommendation Using a Random Forest Classifier

Soil parameters are important for cultivation, and depending on these parameters, there are crops more suitable than others

Gabriela Padilla

Published in

Insights of Nature

5 min readMay 14, 2024

By Gabriela Padilla

Predicting the optimal crop

The goal of this project is to predict the optimum crop to be cultivated based on several parameters so farmers and people can make an informed decision before cultivation.

The parameters that we are considering for this project are: ’N’, ‘P’, ’K’, ’temperature’, ‘humidity’, ‘pH’, and ‘rainfall’. To make this prediction, we will use a Random Forest algorithm.

Let’s first get an overview of what Random Forests are:

Random forest is a commonly used machine learning algorithm that combines the output of multiple decision trees to reach a single result. Its ease of use and flexibility have fueled its adoption, as it handles both classification and regression problems.

Imagine you’re trying to make a big decision, like choosing a vacation destination. You have a group of friends who each have different expertise and preferences. Some are foodies, some are beach lovers, some are adventurous hikers, and so on.

Now, instead of relying on just one friend’s opinion, you decide to consult your entire group. Each friend independently suggests a vacation spot based on their expertise and preferences. They might suggest places they’ve been before or heard great things about.

Once everyone has made their suggestions, you tally up the votes and choose the destination that received the most recommendations. This way, you’re benefiting from the diverse knowledge and perspectives of your friends, which helps you make a more informed decision.

In this analogy:

Each friend represents a decision tree in the Random Forest.
Their suggestions are analogous to the predictions made by each tree.
The vacation destination with the most recommendations is the final prediction made by the Random Forest.

By aggregating the opinions of multiple friends (decision trees), you’re likely to make a better decision than if you relied on just one friend’s recommendation. Similarly, by combining the predictions of multiple decision trees, a Random Forest can often make more accurate predictions than any single decision tree alone.

Step-by-step project explanation

Dataset

For this project, the dataset is a list of 22 crops with the parameters where they are more suitable to grow. 80% of the dataset is used for training the model and 20% for testing. The crops in the dataset are:

rice, maize, jute, cotton, coconut, papaya, orange, apple, muskmelon, watermelon, grapes, mango, banana, pomegranate, lentil, black gram, mung bean, moth beans, pigeon peas, kidney beans, chickpea, coffee

Model

The model is a Random Forest Algorithm; in this case, we are using it for classification. After training, the model had an accuracy of 99%.

from sklearn.ensemble import RandomForestClassifier

RF = RandomForestClassifier(n_estimators=20, random_state=0)
RF.fit(Xtrain,Ytrain)

predicted_values = RF.predict(Xtest)

x = metrics.accuracy_score(Ytest, predicted_values)

We will also save the model which will be useful for later. To do this, we need to use this:

import pickle
# Dump the trained Naive Bayes classifier with Pickle
RF_pkl_filename = 'RandomForest.pkl'
# Open the file to save as pkl file
RF_Model_pkl = open(RF_pkl_filename, 'wb')
pickle.dump(RF, RF_Model_pkl)
# Close the pickle instances
RF_Model_pkl.close()

Making predictions

import ipywidgets as widgets

def get_predictions(x1,x2,x3,x4,x5,x6,x7):
    data = np.array([[x1,x2,x3,x4,x5,x6,x7]])
    prediction = RF.predict(data)
    print(prediction)

N = widgets.FloatSlider(min=0.0, max=140.0, value=25.0, step=2.5, description="Nitrogen")
P = widgets.FloatSlider(min=5.0, max=145.0, value=25.0, step=2.5, description="Phosphorus")
K = widgets.FloatSlider(min=5.0, max=205.0, value=25.0, step=2.5, description="Potassium")
temp = widgets.FloatSlider(min=10.0, max=44.0, value=25.0, step=2.5, description="Temperature")
hum = widgets.FloatSlider(min=15.0, max=99.0, value=25.0, step=2.5, description="humidity")
ph = widgets.FloatSlider(min=3.5, max=9.9, value=5.0, step=.5, description="pH")
rain = widgets.FloatSlider(min=20.0, max=298.0, value=25.0, step=2.5, description="Rainfall (mm)")

im = widgets.interact_manual(get_predictions,x1=N,x2=P,x3=K,x4=temp,x5=hum,x6=ph,x7=rain)
_ = im.widget.children[-2].description = 'get prediction'
_ = im.widget.children[-2].style.button_color='lightgreen'

display(im)

With this code, prediction in Google Colab would look like this:

Same approach, different applications

The use of the Random Forest algorithm for predicting optimum crops based on soil parameters unveils a versatile approach applicable to a multitude of domains. Beyond agriculture, the robustness and flexibility of Random Forest can be harnessed to tackle predictive challenges across diverse industries, leveraging its ability to handle complex datasets and provide accurate forecasts.

In healthcare, the Random Forest algorithm can serve as a powerful tool for assessing patient risk profiles and predicting medical outcomes. By integrating patient health records, genetic data, lifestyle factors, and diagnostic tests, healthcare providers can build predictive models to identify individuals at high risk for developing specific diseases, enabling proactive interventions and personalized treatment plans.

Environmental scientists can employ Random Forest to analyze complex environmental datasets and predict changes in ecosystems and biodiversity. By integrating satellite imagery, climate data, and species distribution records, researchers can build predictive models to assess the impact of climate change, habitat loss, and human activities on ecological systems, informing conservation strategies and policy decisions.

Businesses can utilize Random Forest to predict customer churn and optimize customer retention strategies. By analyzing customer demographic data, purchase history, and engagement metrics, companies can build predictive models to identify customers at risk of churning, tailor retention offers and marketing campaigns, and enhance overall customer satisfaction and loyalty.

Next steps

My next step is deploying the model using Streamlit. Streamlit is a popular Python library used for creating web applications with simple and intuitive interfaces. It allows us to showcase our machine-learning models and interact with them in real time through a web browser.

You can check out the code in this GoogleColab. Here you can find the original project I took inspiration from to build this project as well as the dataset.

Thank you for reading this! If you want to see more of my work, connect with me on LinkedIn!