How I applied linear modelling to solve a ‘real life’ problem

5 min readNov 11, 2017

Couple of weeks ago, linear modelling was a distant memory in my brain. It is one of those things you learn in school and you brush it of as something you’ll never apply in ‘real life’.

But then I had an ‘aha’ moment when I was trying to find a solution to a problem that a client had asked. Through relentless research and late nights, I found a technique to solving the problem — Linear models.

Goal of the blog post is to distill the discovery and learning journey into a practical example and implementation practices in a most intuitive way.

Although I can’t share the details of the problem because of confidentiality, I can go through an example that proxies the problem closely.

Before we jump into it, it’s important to define a structure around forming the problem statement and key aspects of linear programming:

Forming a problem statement — this is an important phase as you want to think about the problem you are solving. Is it maximization (i.e maximizing profit) or minimization (i.e. minimizing cost or time) problem? What are some of the constraints you are dealing with (i.e. capacity)
Decision Variables — as the name states, it is variables that equate to decisions that are to be made. For example, what is the maximum number of books you can read in a year given limited time you have. Here the decision variable is number of books.
Objective — Element of this is coming from the problem statement. Is it a maximization or minimization problem?
Constraints — these are essential as they define boundaries for feasible and optimal solution.
Testing — this is my own addition to the formal structure. Testing is important in any application and no exceptions for linear models. Here our goal is to make sure that constants x decision variables add up to constraints.
Intuition — another one of my own additions to the structure. Not all models will spit out results that make sense. As you do more and more, you will gain an intuition of defining constraints, making reasonable assumptions to obtain results that make more sense. You will see this in practice below.
Implementation — a data science problem such as this is solved best using programming. Python is a popular choice as it has ample mathematical libraries and frameworks. I will be using NumPy and PuLP modelling framework to find optimal solutions.

Alright, enough of stories and words — let’s jump into defining our problem

Every week, my spouse and I go to a very busy grocery store to do shopping. I am OCD about getting the shopping done in as little time as possible. We buy approximately 50–60 items that are probably in 10–15 locations. By locations I mean different aisles and different section of aisles. Usually we take around 45–60 minutes to finish shopping (checkout time is extra). I take about 32 seconds to find/pick item and about 50 seconds on average to walk to a location. My spouse takes 50 seconds to find/pick item but takes around 32 seconds on average to walk to a location. What is the optimal amount of items and locations in a grocery store that my husband and I should be allocated so that shopping time is minimized?

Now that a very ‘real life’ problem has been defined, let’s pick this apart and extract our decision variables, objective and constraints.

Decision Variable

Decision here is how many items and locations we both should have given our efficiency to find/pick and walk to a location.

Objective

Minimize shopping time given the decision variable above is our objective. Let’s keep things `apples to apples` and convert the time in seconds to minutes as it makes sense to read the results in minutes denomination.

Ok here is the kicker — you must ensure your objective function is linear in nature. That means that sum of constants x decision variables must sum up.

Above minimization function only has two individuals participating in shopping madness. However, this can work for n number of individuals. Following represents generalization of the above equation:

Constraints

I will take the constraints as picking 60 items from 15 locations as this seems to be majority of the scenario.

Constraint 1 — constraints the number of picks to 60

Constraint 2— constraints the number of location to 15

Intuition & Assumptions

It doesn’t serve any purpose (because I walk slower) for me to pick items at only one location. But it’s very possible the optimal result for a machine is me picking at one location 40 items. For results to make more sense, I will assume minimum locations for my spouse and I are 5 as after that fatigue kicks in from walking in a crowded space. Also, minimum each person picks is 20 because intuitively that seems like a fair ratio to 5 locations.

There is quite a bit to chew on here in terms of problem definition and linear model parts. In the next blog post, I will go through how to solve for this problem using Python mathematical libraries such as NumPy matrices, some historic data (that calculates efficiency) and modelling open source library called PuLP.

Optimal Results

If you are curious about what is the optimal items and locations for my husband and I, here are the results:

So looks like we’ve been doing this all wrong — if I took approximately 40 items in 5 locations and my spouse took 20 items in 10 location we could get our shopping done in a total of 25 minutes (we are shopping in parallel so the maximum of the slowest shopper is the total time — which is me!).

I will go in detail on how to get here in the next series.

Food for thought

As an exercise, formulate an objective function with perhaps more individuals doing the shopping. Can you think of adding any other scenarios to this problem? For demonstration purposes, I’ve kept it relatively straight forward but there are other opportunities to add interesting use cases to this situation.

Go to part 2 series to learn more about implementation details!

(As as aside, if you have an efficient or better way of solving this, please feel free to comment/suggest!)