Regression Project — Predict Food Demand (Part 1)

Emmanuel Ikogho

Published in

The Deep Hub

12 min readJul 2, 2024

Deep learning Vs Machine learning for predicting demand of food supply chain.

1.0 Introduction

Genpact is a food delivery company. Recently, Genpact has been losing money due to understocking food items that customers are looking for. When a customer comes to buy chicken, for instance, and it’s not available in the store, that customer will become disappointed and move to another store.

Therefore, demand forecasting, or predicting the demand for food items in Genpact stores, has become their major priority. It is crucial for them to know the exact amount of food their customers are going to buy, so they can stock the right amount — not more, not less — thereby saving money.

That’s why they have hired you, a data scientist, to not only analyze their data but also give recommendations on how much food will be in demand in the future to meet their customers’ needs.

1.1 What is Time Series Analysis?

This is a time series regression problem. First of all, what is time series analysis? Time series analysis involves analyzing datasets that have a date column. A time series dataset is one that has been collected at successive points in time, while a cross-sectional dataset is collected at a fixed point in time where time doesn’t matter.

For example, if your smartwatch tracks your sleep, it should be able to tell you how much sleep you got throughout the month. The graph below shows my sleep over a fixed period of time. The y-axis shows my sleep levels, while the x-axis shows the day of the month. When time is involved, the dataset becomes a time series dataset.

Watch this video to know more about this.

1.2 Importance of a Time Series Data Project

Why is this important? Most businesses are affected by time. Time usually affects the sales of a product and the revenue of a business. For example, the sales of clothes are high during the Christmas period. Another example is in agriculture, where one can use time series analysis to determine the best time of the year to plant crops.

So, how do you know if a dataset is a time series dataset? Simple: if there is a time variable in it (usually the index), then yes, it is a time series dataset. Otherwise, it is not.

Most time series datasets have the time variable as the date. For the dataset used in this project, we only have the week, but it’s still a time variable nonetheless, making it a time series dataset.

1.3 Aim of This Project

The aim of this project is not only to analyze the demand dataset given by Genpact but also to build a complex algorithm that can predict the future demand for food items.

At the end of this project, we will have built a complex algorithm that predicts future demand of food. We will also deploy this algorithm as an API using FastAPI, which can be integrated into a web application or an app so that our algorithm can be used by anyone in the real world.

1.4 Source of This Project

This project was derived from a research paper. Here is the link to the original research: https://doi.org/10.1109/ACCESS.2023.3266275

I am going to implement my own version of the research using my code to either prove or disprove the findings of the researcher. I will follow the exact methodology described in the paper, but I will make modifications where I disagree with the original researcher. I believe that implementing a research paper is the fastest way to improve your skills as a data scientist. You will learn a lot from this project, so follow along closely. Are you ready? Let’s go.

All the code needed for this project is available here.

2.0 Dataset description

The ‘‘Food Demand Forecasting’’ dataset released by Genpact comprises 145 weeks’ worth of weekly orders for 50 distinct meals. Download the dataset here. The data is divided into three files with a combined total of about 4,50,000 entries and 15 features, which are explained as follows:

Given the following information, the task is to predict the demand for the next 10 weeks (Weeks: 146–155) in the test set:

Historical data of demand for a product-center combination (Weeks: 1 to 145)
Product(Meal) features such as category, sub-category, current price and discount
Information for fulfillment center like center area, city information etc.

2.1 Data Dictionary

1. weekly Demand data (train.csv): Contains the historical demand data for all centers, test.csv contains all the following features except the target variable

2. fulfilment_center_info.csv: Contains information for each fulfilment center

3. meal_info.csv: Contains information for each meal being served

3.0 Exploratory Data Analysis: EDA

3.1 Dataset overview

First load the dataset.

Have a look at the loaded datasets:

Like I said, we are to predict number of orders for the next 10 weeks, starting from week 146 in our test set.

Now let’s have a peak at the remaining dataset.

3.2 Univariate Analysis

In this section, we will analyze each feature individually. Both categorical and numerical features are analyzed thoroughly.

Does our dataset have outliers?

Let’s plot histograms to see the spread of the distribution for each feature.

As observed for ‘‘num_orders’’ above, most of the values for this feature are concentrated around 0–1000, but the spread of the distribution is greater than 20000 which implies that there might be a possibility of an outlier.

That can be made clear by plotting box plots for each of the features or only the feature of concern.

As seen in above, there is only one outlier in the number of orders that may be a result of miswriting and is therefore removed as it may hinder the further analysis.

Which type of center received the most orders?

Type_A centers received the highest number of orders, whereas Type_C centers received the least.

What are the top 5 centers?

Type_A has the highest number of orders, but, according to the above, it is seen that center 13 of Type_B has the highest number of orders.

The reason behind it is that center Type_A has the most centers as shown below.

What region has the highest number of orders?

Region 56 received an astonishingly high number of orders when compared to other regions.

Which category of meals have the highest demand?

Beverages are the food category with the highest number of orders (highest demand), and Biryani is the food category with the least number of orders (lowest demand).

Which meal was ordered the most?

Meal ID 2290 has received the most orders, as shown in above. The number of orders for various meal IDs does not differ significantly in many cases.

What are the top 5 cities by demand? (number of orders)

City — 590 has the most orders with 18.5M, which is nearly 10M more than the city with the second-highest number of orders, City 526, with 8.6M.

Which week did we have the lowest demand?

The highest number of orders were received in week 48 and the lowest number of orders in week 62.

3.3 Multivariate Analysis

Is there multicollinearity in our variables?

Base price and checkout price have a high correlation. So let’s visualize it with a scatter plot.

The figures above prove that base_price and checkout_price have a high correlation, so one variable, checkout_price must be removed.

4.0 Feature Engineering

Here is the section for creating new features.

A new feature is created: a discount, which is the difference between the base price and the checkout price. We attempted to determine whether or not there is a relationship between the discount and the number of orders. Surprisingly, there is no strong link or correlation between them.

Lag Characteristics: Imagine you’re trying to predict something (like sales) for next week. One way to do this is by looking at what happened in previous weeks. Lag characteristics are like looking back in time to see what happened in the past, which can help us predict what might happen in the future.
Exponentially Weighted Moving Average (EWMA): When we use lag characteristics, we often give equal importance to data from all the previous weeks. However, in real life, we usually care more about recent data because it’s more relevant to what’s happening now. EWMA is a way to consider recent data more than older data when making predictions.
EWMA(w) = α ∗ x(w) + (1 − α) ∗ EWMA(w − 1)
How EWMA Works: EWMA calculates a kind of average where recent data gets more weight than older data. The formula for EWMA looks complicated, but it’s not too hard to understand. Basically, it says that the EWMA for this week is a combination of the actual data for this week and the EWMA from last week. The parameter α decides how much weight we give to the actual data for this week. Setting α to 0.5 means we give equal importance to the actual data for this week and the EWMA from last week.
In simpler terms, EWMA helps us make better predictions by focusing more on recent data, which is often more important for understanding what might happen next.

In this case, data from the previous 10 weeks is used to forecast demand for the next 10 weeks. So we will set the min_periods as 10 to represent 10 weeks.

It seems that after calculating the exponentially weighted moving averages (EWMA) for the ‘base_price’ and ‘discount’ columns, the resulting new columns (‘ewma_base_price’ and ‘ewma_discount’) have 9 NaN values each. This is likely due to the min_periods parameter set to 10 in the ewm() function.

Let’s handle these missing values by interpolation.

Other features calculated include the rank of a meal based on its base price, meal count features in relation to a meal’s category, cuisine, center, city, and region, as well as the highest, lowest, and average prices of the meal in a city and a region.

Compute a few more features from the existing ones in train, which are mentioned as follows:

1) base_price_max: maximum base price for a particular meal, irrespective of the week, center, city, or region.
2) base_price_mean: the average base price for a particular meal, irrespective of the week, center, city or region.
3) base_price_min: the minimum base price for a particular meal, irrespective of the week, center, city or region.
4) center_cat_count: a count of orders belonging to a particular category in a particular center.
5) center_cat_price_rank: rank of the price of a particular category of meals in a particular center.
6) center_cat_week_count: a count of orders belonging to a particular category in a particular center in a week.
7) center_cui_count: a count of orders belonging to a particular cuisine in a particular center.
8) center_price_rank: rank of the price of a particular meal in a particular center.
9) center_week_count: a count of orders in a particular center in the given week.
10) center_week_price_rank: rank of price of a particular meal in a particular center in a particular week.
11) city_meal_week_count: a total count of a certain meal sold in a city in a particular week.
12) meal_count: a count of a certain meal sold in all the weeks across all centers and regions.
13) meal_city_price_rank: rank of price of a particular meal in a particular city.
14) meal_price_max: maximum checkout price of the meal irrespective of the week,center,city or region.
15) meal_price_mean: average checkout price of the meal irrespective of the week,center,city or region.
16) meal_price_min: minimum checkout price of the meal irrespective of the week,center,city or region.
17) meal_price_rank: rank of price of a particular meal.
18) meal_region_price_rank: rank of price of a particular meal in a particular region.
19) meal_week_count: a total count of a certain meal sold in the given week.
20) meal_week_price_rank: rank of price of a particular meal in a particular week.
21) region_meal_count: a region’s total sales of a particular meal.
22) region_meal_week_count: the total number of a specific meal sold in a region during a specific week.
23) type_meal_week_count: the total amount of a specific meal sold in a center over the course of a particular week.

Now this is the first thing I disagree with. If you look at the code below for creating these features, I excluded some features by commenting those lines.

In real life, or in a business scenario, since we are trying to predict the number of orders, we would not have access to this information. Therefore, any feature based on the number of orders must be excluded, as we cannot extract these features from our test set. The same goes for the EWMA features created above. In the research paper, the EWMA was created based on the number of orders, not the discount and base price like I did.

The author of this research created these features from the training set (which contains the number of orders) and then split it into test and train sets (with the test set being the last 10 weeks). Naturally, any feature created from the target variable (number of orders) will have a high correlation with the target and a high feature importance. However, as I mentioned, this is not possible in real life since the target is what we are trying to predict and will, therefore, be unavailable for preparing the test set.

The features created from the target variable were the only things that gave the models in the research paper such high performance. Initially, based on the RMSLE values, the best score I got was 0.0084 from the CatBoost Regressor, which was better than the 0.28 score from the LSTM model in the research paper (RMSLE scores are better when closer to zero).

As I expected, once I excluded the features created from the number of orders column, the performance dropped significantly to an RMSLE score of 0.54, as we will see in our results later on.

4.2 Handling Skewness

The values for each feature are not on the same scale, and most of the numerical features in the dataset are heavily right-skewed. Skewness can be decreased by applying the quantile or power transformation. Both were used and compared.

It is evident that the quantile transformation reduced skewness more than the power transformation in our case. The purpose of these transformations is to make the distribution of the data more symmetric and closer to a normal distribution. Since the data already resembles a normal distribution after transformation, standardization (scaling to mean 0 and standard deviation 1) is a better option than normalization. Therefore, after using our quantile transformer, we will use the standard scaler.

To prevent this article from becoming too long, I’ll have to continue in a part 2 next week. Hope you enjoyed this project so far. Don’t forget to give this article a clap if you found it helpful.