What’s so hard about ordering blueberries?

Why the perfect blueberry order is more like playing chess than classifying images.

Published in

Afresh Engineering

11 min readMay 3, 2021

Afresh builds modern supply chain technology for the challenges of fresh food. Every year, one-third of food produced globally goes to waste. In the United States, 40 percent of food waste occurs at the retail and consumer level. Most of this waste occurs in fresh food departments at grocery stores. Afresh is working to eliminate this waste.

The problem

Imagine you run a very, very small grocery store. You carry two products: cereal and blueberries. Every day, your friend shows up to the door with a truck full of each item, and asks you how much you want to buy at cost. Your goal as a grocery store owner is to maximize your profit over these two items so that one day you can expand to more locations (and maybe carry more than two products). How do you do it?

Why good forecasts aren’t good enough

This sounds like a simple supervised machine learning problem. Let’s see what goes wrong when you frame it that way.

You’ll frame this as a regression problem to predict daily sales. Why? If you knew exactly what the sales would be every day, you should order exactly what will sell. That way, you’ll never run out of items, and you’ll never have extra on the shelf.

So you build a forecaster to predict the demand of each item, given:

what time of year it is
what the weather is like
what last week’s sales were
whether the Giants are playing tonight.

You train your model to minimize the absolute error between its predicted values and the true values on a train set, and validate it on a test set. You find that it’s impossible to truly predict demand, but the test error looks low enough.

Comfortable in your machine learning wizardry, you deploy the model and order exactly its recommendation.

Oops! Every other day, you run out of product and lose valuable sales. Always ordering an estimate of the median sales, which the L1 loss minimizes for, means that you’re understocked close to 50% of the time.

What’s worse, you have no idea how to handle the times when you sell less than what you order. Is a 7-day-old box of cereal the same as one delivered today? Maybe. What about a 7-day-old package of blueberries? Spoiler alert: it’s not.

We need a new model that will explicitly maximize sales and minimize your food waste over the next chunk of N days. It will calculate the best order for every combination of price, cost, future demand patterns, and perishability.

An optimal solution for ordering cereal

We’ll start by solving the ordering problem for cereal. It lasts about a year on the shelf, so we don’t have to worry about it going to waste.

Let’s step back and fully understand the problem we’re solving. Every day, we look at our shelves and count our leftover inventory. We need a function p(inventory,demand,day) → order that tells us the optimal amount to order on any given day. This function will also take into account our future profits: if we order 100 extra packages of cereal today, we’ll be stuck with them tomorrow. We’ll call this function a policy. We will also define a cost equation that will tell us how good or bad it is to take a certain action from a certain state. If we can prove that our policy has the lowest expected cost for our problem, we’ll call it an optimal control policy.

Inventory control is a type of finite horizon optimal control problem because it operates in discrete time (since we order on a daily resolution) and because our cost function is additive over time. You may have encountered similar ideas in reinforcement learning. Let’s precisely define this class of problem:

Notation follows (Bertsekas, D. P. (2018). *Dynamic programming and optimal control*)

We can easily solve this particular problem for an optimal policy by using dynamic programming. This is because our inventory problem follows the principle of optimality — an optimal policy starting at any of the later days in our horizon is a truncated optimal policy for the whole sequence.

How does this work in practice? We will solve for the optimal order and state cost at time t=N-1, and then solve for the optimal order at time t=N-2, given the already derived costs at N-1. We’ll compute all of these values recursively and end up with an optimal control policy over the horizon.

How inventory evolves given different order quanties and a set of probabilistic demands.

Let’s define all of this in Python.

First, we’ll define what happens when we place an order. In this case, it’s as simple as adding in our order and subtracting out any sales. We can’t sell more than we have in store, and so any negative values are clipped to zero.

Next, we’ll write down the cost function of ordering some amount order and ending in a state inventory.

The cost function will assign a penalty to shortages, under_order_cost , and a penalty to storing too much product, over_order_cost . We also have to pay for each package of cereal, reflected in order_cost . Clearly, this problem needs order_cost < under_order_cost , otherwise we’ll never order cereal.

def get_cost(inventory, order, order_cost,
             over_order_cost, under_order_cost):
    over_order_cost = max(inventory, 0) * over_order_cost
    under_order_cost = -min(inventory, 0) * under_order_cost
    return over_order_cost + under_order_cost + order * order_costdef act_and_get_cost(inventory, order, demand):
    # possibly negative  
    resulting_inventory = inventory + order - demand
    cost = get_cost(inventory, order)
    next_inventory = max(resulting_inventory, 0)    return next_inventory, cost

Now, we’ll define our solver. We will iterate through possible orders at each state, and find the one with the lowest expected cost by integrating over the probabilistic demand. The lowest cost is assigned as the state’s value, and the best order is saved as our policy for that state. We’ll recursively define these states until we hit t=0. The output of our solver will be:

path_array : [time, inventory_count] → optimal order if you are in this state

cost_array : [time, inventory_count] → cost of being in this state

Let’s run our solver on a random demand distribution.

# 10 days of forecasts, 15 possible values
forecasts = np.random.random((10, 15))
# Normalize to make probabilities:
forecasts = forecasts / np.expand_dims(forecasts.sum(axis=1), 1)# Solve
solver = NonPerishableDPSolver(2, 6, 0)
explain_array, path_array, cost_array = solver.dp_solve(forecasts)

We can analyze our policy’s decisions at time T=0:

What do these plots tell us? For the set of demand probabilities we’ve generated, there is an optimal amount of inventory — five units — that we should have on hand at time T=0 after our morning shipment arrives. If our morning inventory is greater than five units, we should order nothing. If we are below five units, we should order up to five units, and nothing more. Because it costs money to order cereal, the expected cost of a state with already stocked inventory is lower.

We’ve experimentally rediscovered a classic result. The optimal order quantity at time k for our problem takes the following form, given a reasonable choice of cost function:

Bertsekas, D. P. (2018). *Dynamic programming and optimal control*. Athena Scientific.

How about computational complexity?

Not too bad. We can solve very large sizes of this problem and not worry about running out of RAM or time.

We could learn a lot more about optimal cereal ordering, but let’s call it a day for now. We have another product to order.

An optimal solution for blueberries

Although we’ve made good progress on ordering cereal, the perishability of blueberries is going to greatly complicate our model. Let’s modify our earlier dynamic program to take into account that berries have a shelf life.

To do this, we keep track of the age of every pack of blueberries on our shelf, and structure our ordering policy so it makes decisions based on this age distribution. Instead of solving for an optimal order at time t for a one dimensional state x, we now have the following representation for a shelf life S:

This state vector can represent any combination of blueberry ages: for example, a total of seven packs, made up of three 5-day-old blueberries and four 3-day-old blueberries.

If we assume first-in-first-out (FIFO) sales (the oldest blueberries sell first), our action/state transition can be described by the following three functions:

def get_sales(state: np.array, total_sales: int):
    """Given total_sales, allocate them in a FIFO manner"""
    for i in range(max_age, 0):
        sale = min(state[i], total_sales)
        total_sales -= sale
        state[i] = state[i] - sale
    return total_sales # lost salesdef transition(state):
    """Every item of age k ages into k+1, oldest items go to
     waste"""
    last = 0
    tmp = 0
    for i in range(max_age):
        tmp = state[i]
        state[i] = last
        last = tmp
    return last # wastedef act(state, shipment, sales):
    "Given a shipment and sales, transform the state"""
    state[0] = shipment
    lost_sales = get_sales(state)
    waste = transition(state)

We’ll solve a slightly more complex dynamic program than the one above to take these new state/transition functions into account (I’ve left out the code since it includes a lot of less readable optimizations). Our cost function will be the same as above.

Analyzing the blueberry policy

The optimal control policy that we solve for blueberries is over a much higher dimensional state space. That means we’ll have to project over some of the problem’s dimensions to understand this policy.

Below, we plot the state cost of carrying two units of blueberries of various ages. We can see that older stock has a much higher expected state cost than younger stock. Two 2-day-old blueberries are 2.5x less valuable in this scenario than two fresh blueberries, given T=0 and the generated demand distribution.

Here, we plot the amount of inventory in stock vs. the percentage of “old items” (older than three days) vs. the expected state value. The value function is not nearly as clean as the one-dimensional cereal problem. It’s very clearly impacted by the relative age of our inventory.

This complex value function has important implications for ordering. For example, this plot shows us that having 15 units of new stock is a better state for the store than having five units of 3-day-old stock. In the cereal example, 15 units are worth 15 units, no matter if they’re two weeks old or fresh from the truck. Using a cereal policy to order blueberries is probably a bad idea. That leads us to an important question.

What if we ordered blueberries like they were cereal?

Now that we have two policies — one that doesn’t consider shelf life, and one that does — we can compare their performance in a simulator.

We set the shelf life of blueberries to 3, their unit cost to 1.75, and their price to 2.
The cereal policy plans out all of its moves without considering shelf life, as before.
The blueberry policy plans out its orders with known shelf life. It has the same order_cost and under_order_cost as the cereal policy.

We’ll run each of these precomputed policies on a large number of generated demand trajectories. Below, we print two sample simulations and the expected cost of each policy over 1000 simulations.

Over a large number of sampled demand patterns:

Our blueberry policy is wasting 30% fewer blueberries than the cereal policy.
Given the price and unit cost we’ve assigned to blueberries, the shelf-life-aware policy has a nearly 8% lower cost, with an average cost over the simulations of 45.591 vs. 49.246.

We clearly need to consider perishability when we order blueberries.

Perishable Inventory Control at Afresh

While these optimal policies are useful for understanding the shape of our problem, they’re not quite suited for production.

The above discussion glossed over some thorny realities. The state space of the perishable inventory problem grows exponentially with shelf life.

Whereas cereal required a 150-state dynamic program to find an optimal policy with T=10 and U=15, we need to solve a 1.7-billion-state-action dynamic program to find a policy for blueberries with a six day shelf life. We also ignored a lot of factors (discussed below) that make solving this problem with exact dynamic programming hard.

That does not mean we’re stuck with using a cereal policy to order blueberries! After all, autonomous driving, Chess, and Go all operate in similarly intractable environments. Novel ideas from optimization, simulation, control, and reinforcement learning apply to fresh food ordering just as they do to playing board games.

Afresh combines ideas from all of these domains to attack this difficult problem. We developed ways to estimate complex models of item perishability from data, developed state-of-the-art probabilistic forecasting techniques to estimate product demand, and created novel optimization tools to put it all together. We validate these models on large scale simulations over millions of store-items-days. We’ve deployed them to hundreds of U.S. grocery stores where they power daily fresh food replenishment.

The real world

Today’s supply chains are cereal supply chains. They are not engineered for the complexities of blueberries. This has dramatic consequences for the environment. In the United States, between 30–50% of all fresh food goes to waste. Cutting this waste by 25% would conserve 83 billion gallons of water and eliminate 10 million tons of greenhouse gas emissions per year.

Afresh was founded to solve this problem. Our technology works. We’ve already reduced food waste at major grocery chains by up to 25%, and every improvement to our models brings that closer to 100%. That’s no easy task — the real world is much more complicated than a blueberry-cereal store.

Here are some more problems to think about:

We can create more sophisticated policies that adjust prices based on how old the product we’re carrying is, or adjust shelf space on a weekly basis depending on how well products are selling. These policies have extremely complex stochastic decision structures.
Grocery chains cut up watermelons into fruit bowls and put mangos in smoothies. These derived products have different margins and perishability characteristics— how many of them do we make, and when do we make them?
We care about different metrics in the real-world and have to solve optimization problems with tighter constraints: for example, grocery stores might require that bananas have a less than one percent risk of stocking out.
Demand forecasting is its own complex and demanding system. Demand for many perishable goods is highly seasonal, highly elastic, and is substitutable between different products. Demand may be impacted by the age of products on the shelf (how does that change our DP?)
We don’t know the shelf life of all fresh items. Additionally, shelf life is not cleanly behaved — some items might decay exponentially, some might perish after a certain number of days. Many items do not follow a FIFO sales pattern. We need to learn this behavior from data.
Stores operate under backroom and truck size constraints that might constrain our ordering. Larger orders might lead to a discount from a supplier.

Want to help tackle one of today’s biggest environmental issues with modern science and engineering? Take a look at our current openings.