Step by Step guide to building end-to-end industry level Machine Learning Project

Fatima Arshad
4 min readAug 3, 2022

--

PART 1 — SCRAPING

Top 5 Machine Learning Projects for Beginners | HackerNoon

When I first started working in the industry I was confused about how to put academic Machine Learning Concepts into practice.

Looking back at it, I can say with certainty that every entry level engineer/ fresh graduate is just as baffled as I was.

In these blog posts Ill be taking you through a step by step journey on how to build an entire Machine Learning pipeline and put it to use. Problem we will be catering is price prediction of ever-volatile broiler chickens in Pakistan.

In broiler chicken business farm rate, market open and close rate is announced everyday. If a farmer is able to predict prices beforehand then he will be able to sell his chickens at a higher rate and will generate more profit.

Given all the previous prices we will be predicting next day’s or month’s price. As you can see this is a time series problem because each data point is a new time.

Your use-case might be different. However, generic steps in solving any Machine Learning problem remain the same. You can take or remove any depending on your problem.

LET’S GET STARTED!!!

Before kickstarting any project you need to make sure that you have ALL THE DATA you may or may not need. We can prune this data later. But not gathering enough data in beginning might have catastrophic repercussions in the latter stages of project.

You’ll be given data either by client or you might have to scrape it. Even if client provides the data you might still have to scrape some to have enough to train a model and derive fruitful conclusions.

We will be scraping data from: https://www.agbro.com/

List of prices of different cities in Year 2022

We will be scraping this table for years 2015 to 2022. This table contains information regarding day and Market Open and Close prices as well as Farm Rate. DOC is price of day old chicken.

As we discussed earlier, gather all the data. Don’t skip any variable by making an assumption beforehand. However, for now we will be scraping information of Rawalpindi.

Import all the libraries:

Right click on first date and click inspect element.

This date is highlighted in the Elements tab. You can see that the table it is contained in has attribute class which is “tabelizer-table”. These attributes help identify and set elements apart from one another.

Next, we will be extracting each row. First three rows are skipped since they only contain headings. And when a row is extracted we get values for all cities. So, we will be only extracting first 6 since we are only interested in Rawalpindi.

Let’s create Data Frame.

Let’s see what we have scrapped

Looks good. Now we need to scrape all years. If you click on specific year you will observe that this is what URL becomes: https://www.agbro.com/broiler-market-prices-2021/

We can create a list of years and simply append them to this url. Resulting output will look like this

Part 2 — Feature Engineering And Model Fitting

Full code can be accessed through

--

--