Food Price Prediction using Regression — Model training and Predicting

Rusiri Illesinghe
4 min readOct 13, 2021

--

Photo by Anna Nekrashevich from Pexels

Table of Contents :

  1. Loading the data
  2. Feature Selection
  3. Model Training
  4. Evaluating the model

This article is the second part of the step-by-step demonstration on prediction of food prices in Sri Lanka. I discussed about the dataset, data cleaning and preprocessing in my previous article. (You can click here and read it! )

Up to now we have created a dataframe from our initial dataset and cleaned it, preprocessed it and finally saved it into our Google drive as a new csv file.

Let’s start from there!

Loading the data

Dataframe is wider now. So many features! Following image contains a part of the dataframe.

Now we can see that we have the features and the label in the same dataframe! So let’s take the feature set into a separate dataframe and save it in variable X and the column vector containing labels in the variable Y.

How many features does X have ? Let’s see!

The result is of the format (a,b) where a is the number of rows and b is the number of columns(features).With the result you obtain from the above line of code, you may encounter that there are large number of features in X ! Do you think that all those features help to uplift the accuracy of our model?? Most probably No!

This is where feature selection comes in! There are various methods of performing feature selection!

Feature Selection

Let’s have a look on our feature set and labels before that!

DataFrame ‘X’
DataFrame ‘Y’

Most of the features (all the others than longitude and latitude) are of Categorical type. I’m using SelectKBest from sklearn for feature selection. And it’s not the only way!

As the output of above code, we get the feature importance scores for top 60 features in X and also I’m displaying top 20 most important features in the plot. We can identify the features with lesser importance and remove them, if the removal of that feature is not affecting the core of the training process!

According to the above top 20 features, we can see that, year ,month(m1,m2..m12 depicts the months),commodity(initially Rice,Onions,Potatoes belongs to type commodity)

At the feature selection step I removed so many unwanted features and filtered out a lesser number of important features. Following are the features I kept:

  1. Year column
  2. Month columns
  3. Commodity type columns
  4. Longitude
  5. Latitude

Model Training

Now I’m going to train my dataset using regression.

Before that, We need to split our dataset as train and test sets.

Now I train the model with the train data (X_train,y_train) using Linear Regression algorithm in sklearn. Then I’m predicting the price values for the test feature set (X_test)

y_predict contains the prices predicted by the model for the particular items in X_test.

Now all set! We need to see how well our model works! So we need to evaluate it using accuracy. For that I’m preparing another method to get the accuracy.

Evaluating the model

Since the model is predicting a price value with certain decimal points, I accept the predicted value as correct if the predicted price value is ± 20% of the real price of the item. Ex : Suppose the real item price is 80.0 LKR(Sri Lankan Rupees)

80+ (80*20/100)=96LKR

80–(80*20/100)=64LKR

If the predicted value is in the range between 64 LKR to 96 LKR, I accept it as a correct prediction. This is the implementation of the function

Now let’s check the accuracy :

Since the real price value and predicted price value may have different number of decimal points, I pass the ceiling values of both actual and predicted prices to the accuracy function.

Following image shows the obtained result :

Yes!!! It’s done! The accuracy is 88.6% which is a satisfiable achievement!

Hope you get clear image on how to make use of regression in real world scenarios and how to get interesting outcomes using the datasets!

Follow me for frequent updated on new articles!

Cheers!!!

--

--