Step by Step guide to building end-to-end industry level Machine Learning Project

Fatima Arshad
4 min readAug 24, 2022

--

Part 2 — Feature Engineering and Model Fitting

We gathered data by scraping from website in Part 1 . Next step is to analyse data and perform feature engineering.

When we scraped our data we had a chance to look into it. This is what it looks like

We have farm rate for a specific date and opening and closing rates of market. Moreover, we also have day old chick’s (doc) column as well.

We are aware that prices of broiler chicken gets affected by many external factors a.k.a Exogenous variables as well. If you are/have a domain expert you can easily deduce these factors. However, part of a job of a Data Scientist is to become a domain expert for domain at hand. This includes thoroughly researching and pruning solutions and ideas.

After researching I found out that prices of Soy bean and Maize/Corn greatly affects prices. So, I decided to get them as well.

Luckily, I found a website which posts daily prices of Maize. Searched it by city “Rawalpindi” and then opened it’s chart. You can enter start and end date and then print it as a csv.

Loading it into a DataFrame and printing to see what’s inside

Columns of interest are second and third but we have to rename them. First one is date and second is daily price of Maize.

After renaming we have to join it with our existing data frame

Resultant data frames looks like this then

Next, we have to cast all our variables except date. We will be casting them to float.

Now that our data is ready we have to split it into train and test sets. We will be training our model from years 2015 to 2021 and will be validating on 2022.

Timeseries forecasting typically involves a very important feature i.e. Lag feature. For instance, we will be calculating a lag feature for open variable. We will be taking a rolling window mean for the past three days. But we need to split our data beforehand since we can’t allow future data leakage into the past. Here is a function which will do just that

As you can see it is applying shift operation in the end. When we will be predicting price for the next day we won’t have values of variables. So, intuitively we perform shift to drag means from previous rows one row ahead. This will definitely leave the very first row Nan and so we use bfill() to fill all the Nans. We will apply this method to all features in train and test.

Now, let’s get to model fitting. We will be applying Facebook’s Prophet to make predictions.

Let’s visualise our forecast using it’s built-in function

It looks like the trend is going in an upward direction. But we can’t deduce much from just this graph.

Let’s see how well our model was able to fit true signal

Our model is following the trend quite closely. That’s good news!

We can now start working on putting this model into production.

Next step is deployment — Coming Soon!

Link to Part 1 — Scrapping

Full code can be found here:

--

--