Recipe for Success: How to Order for Prepped Produce

Andy Chen
Afresh Engineering
Published in
10 min readSep 16, 2022

Afresh builds modern supply chain technology for the challenges of fresh food. Every year, one-third of food produced globally goes to waste. In the United States, 40 percent of food waste occurs at the retail and consumer level. Afresh is working to eliminate this waste at grocery stores.

Cut fruit represents parental love in my household and in many other Asian households. The time and effort spent to slice around a mango pit or cube an entire watermelon allows us children to enjoy fresh and nutritious treats.

My poor re-creation of a cut fruit plate (sorry mom!).

Many grocery stores devote a sub-department to similar fruits of labor (or vegetables of labor). Early each morning, an in-store team assembles berry parfaits, slices stir-fry bell peppers, and otherwise prepares produce for customers to easily eat or cook. The store section holding these ready-made items goes by several names depending on the retailer — “Fresh Cut”, “Ready to Eat”, or “Quick and Easy”; at Afresh, we use the term prepped produce.

Example of a store’s prepped produce section.

Although the store’s relationship to its customers isn’t as intimate as a caregiver’s relationship to their children, this prepped produce department still serves its community by increasing access to fresh produce, especially to those who:

  • Lack time to prepare fresh food
  • Live alone and may prefer smaller servings
  • Lack easy kitchen access
  • Suffer from mobility or motor skill issues, which interfere with safely cutting or peeling produce at home

Meanwhile, from the store operations standpoint, prepped produce:

  • Accounts for a large proportion of produce sales and profit margin
  • Reduces food waste by allowing the ripest produce to quickly reach consumers instead of sitting on the shelf

Ordering for the prepped produce department also presents unique challenges. Compared to a bottle of salad dressing that exits the same way it’s shipped (as itself), a cantaloupe can be divided into multiple parts that are sold and need to be accounted for, creating a challenge for any produce manager or ill-suited ordering system:

  • We might want to order extra if we’re uncertain about cantaloupe bowl sales — but how do we correctly account for uncertainty across all the different cantaloupe items?
  • Do we order differently if cut cantaloupe spoils faster than whole cantaloupe?
  • How do we address demand substitutability i.e. perhaps consumers won’t order as many cantaloupe bowls if whole cantaloupe is 50% off?

A truly fresh-centric inventory management must address these store challenges of prepped produce. In the remainder of this post, we focus on the core problem:

How do we aggregate our knowledge of sales demand to decide a single order? For example, based on demand for half cantaloupes, cantaloupe bowls, and fruit salads, how do we decide how much whole cantaloupe to order?

We build a toy modeling solution that addresses this fundamental question. However, we make important assumptions about the store process, which leaves plenty of room for extending and improving this solution!

Framing Recipe Ordering

Imagine we’re ordering for a single grocery store, and we follow the adventures of the cantaloupe entering the store. How might this cantaloupe be prepared, and how might it be sold?

Our grocery store maintains recipes, which tell us how to transform ingredient items, such as our cantaloupe, into recipe items. Here, the three recipe items that the consumers ultimately buy are:

  • Half cantaloupes
  • 54 oz cubed cantaloupe bowls
  • 23 oz fruit salads made of grapes, cantaloupe, and strawberries

Like its peer retailers, our store receives shipments of whole cantaloupe, and early every morning, we coordinate how much of each recipe item to make for the day. We might follow a recipe such as:

  • Cut the cantaloupe into chunks, which involves cutting out the rind and scooping out the seeds. This preparation yields 60% of the original cantaloupe weight.
  • Remove the grape stems, which might remove 10% of the original weight.
  • Remove the strawberry tops, which might remove 20% of the original weight.
  • Combine 2 parts cantaloupe, 1 part grapes, and 1 part strawberries in a tub.
  • Fill each bowl to the brim of the fruit salad, which should yield 16 oz.

Our objective is to order enough whole cantaloupe to accommodate all of the recipe item demand and to waste as little cantaloupe as possible. Assuming we don’t change the recipe, we can focus on the direct conversion between the cantaloupe ingredient item and the cantaloupe recipe items — how much whole cantaloupe do we need to produce one fruit salad? If a cantaloupe weighs about 4 lbs, then the conversion factor is 0.3 cantaloupes per salad.

If we repeat the calculation for the other recipe items, we derive a mapping between the cantaloupe recipe items and the cantaloupe ingredient item. This conversion allows us to translate consumer sales into ordering decisions.

Mapping between the cantaloupe recipe items and the cantaloupe ingredient item, including the numerical conversion factor. If we added all the other ingredient and recipe items, we’d have a weighted bipartite graph.

Designing the Cantaloupe Order

With this recipe framework, let’s design how we order for cantaloupe in the prepped produce section! In our simplified scenario, we assume that at our grocery store, we can:

  • Record cantaloupe inventory and place the cantaloupe order
  • Receive and unpack the cantaloupe from that day’s order
  • Prep the cantaloupe and restock the recipe items

… all before the store opens! Then, we want to optimize our inventory just before the store opens.

Forecasting Demand with Quantile Regression

Let’s start by understanding the demand at the store. To construct today’s cantaloupe order, we want to predict today’s cantaloupe bowl sales by combining historical sales data and covariate features such as price, day of week, and weather forecasts that might affect demand. With this information, we can approach this prediction problem as a supervised regression problem, and a number of forecasting models could work here.

One caveat is that most regression models output a single point estimate that minimizes the L1 or L2 loss. However, we also want to quantify the uncertainty of our forecast; if we’re uncertain about cantaloupe bowl demand on Independence Day, a high-selling day, we might prepare for twice as many cantaloupe bowls. Therefore, we want a regression model that outputs the entire demand distribution.

In our situation, recipe items’ demand can behave quite differently from each other; perhaps cantaloupe bowl demand is consistent and resembles a Gaussian, but fruit salad demand is sporadic and looks more Poisson! That means we should aim for a nonparametric representation of demand; one way to achieve this goal is to output a vector of quantile forecasts. (Read more about quantile regression here or here.)

Then, we train on the entries’ quantile losses (also known as pinball losses); each loss function weights positive and negative error differently. For the 0.3th quantile, we penalize positive error more heavily, so the loss is steeper for negative residuals than for positive residuals:

Quantile (pinball) loss for the qth quantile output.

If y represents the true demand, then our loss function for a single data point is the following:

At prediction time, we only have quantile forecasts for q = 0.1, 0.2, … , 0.9. To compute the 0.75 quantile forecast, we linearly interpolate using the 0.70 and the 0.80 quantile forecasts.

Depending how accurate we find this approximation, we can choose the number of training quantiles (i.e. the cardinality of the output) to tell us the shape of the demand distribution.

Probability distribution for cantaloupe bowl demand, with dots at quantiles 0.1, 0.2, …, 0.9.

Alternatively, if we don’t want to linearly interpolate between quantiles, we could instead have our regression model take an extra input q, representing the desired forecast quantile. Then, during training, we would sample q uniformly from [0, 1], as in Kuleshov et al. In any case, though, we obtain a distribution for each recipe item.

Aggregating Recipe Information

Now, how do we construct the cantaloupe ingredient order? Fundamentally, we’re solving an optimization problem: we design ordering decisions that minimize an objective, such as food waste. While there are many ways we could frame this optimization, let’s approach our tutorial scenario as a newsvendor problem.

For starters, we want to understand the distribution of total cantaloupe demand by aggregating the recipe items’ demands we computed earlier:

If the recipe item demands were all Gaussian distributions, then we’d construct D as another Gaussian distribution with known parameters. In our case, though, the sum of the recipe item demands is less easily expressed in closed form. Instead, we approximate D with a Monte Carlo method; we can repeatedly sample from the recipe distributions. In each trial, we’re picking a random quantile from each distribution. One set of samples might have a mix of low and high quantiles, and another set of samples might have all low quantiles.

Two sets of samples from the demand distributions. Visually, we’re picking a random point in the probability mass of each distribution (denoted by the large dots in the diagrams). We’re assuming the recipe items’ demands are independent, so we can sample each distribution one at a time.

Then, we can combine each set of recipe samples. Based on our previously established conversions:

Mapping between the cantaloupe recipe items and the cantaloupe ingredient item, including the numerical conversion factor.

We can compute the total amount of prepared cantaloupe in each recipe sample set. For example, our first two sample sets would yield 22.4 and 25.5 cantaloupes each.

After repeating this computation, say 498 more times, we have a collection of samples, which approximate the distribution of cantaloupe demand.

Monte Carlo-derived distribution of cantaloupe ingredient demand, with the two dots representing the first two sample sets’ total demand.

Now with this single demand distribution, we can compute our target inventory and decide what to order! We follow the newsvendor problem framing.

In our situation, the cost of being short is larger than the cost of having extra, as it’s worse to deprive a consumer of cantaloupe than to risk the cantaloupe spoiling. (The shelf life of cantaloupe is longer than a single day.) That means the optimal quantile is above 0.5, which means we’re erring towards overordering cantaloupe.

Optimal quantile shown on the cantaloupe ingredient demand distribution.

Thus, to decide how much stock we should have before store opening (i.e. our target inventory), we compute the q*th quantile for cantaloupe demand. We then subtract the existing inventory to arrive at our produce manager’s final order!

Final computation for the store order! The current store inventory includes any cantaloupe that is on the store floor and in the back storage room.

As a recap, the main steps in our journey were:

  1. Forecast each recipe item’s demand with a probability distribution, so we capture our forecaster’s uncertainty
  2. Construct sets of demand samples, one value for every recipe item
  3. Combine each set of recipe demands as per the recipe conversion, yielding a set of cantaloupe demand samples (and thus an approximate distribution of cantaloupe demand)
  4. Take the optimal q*th quantile of the resulting distribution
  5. Subtract out the existing inventory to derive the final order
Visual summary of our modeling process.

Extending Our Solution

Our cantaloupe solution gives a preview of how we might approach ingredient ordering across fresh department recipes! Here are just some possible extensions to ponder about:

  • We want to be certain our recipes are accurate, but different prepped produce leads might cut cantaloupe differently, with different amounts of yield. How might we detect if our recipes don’t correctly represent reality?
  • Recipe items can witness substitutability effects; what happens to cantaloupe bowl demand if fruit salads are all 50% off? How might we handle this case to more accurately forecast recipe item demand?
  • Cut cantaloupe has a store shelf life of around 3 days, and a single underordering cost doesn’t capture the effects of spoilage several days into the future. What if we design a decision-making policy that better models this shelf life?
  • Sometimes the store receives less cantaloupe than expected; perhaps we can help the store decide how to allocate its limited cantaloupe inventory across different recipe items to maximize sales. How many cantaloupe bowls should the we prepare each morning, given the current inventory?

Taking a Step Back

Prepped produce ordering represents both an important function in our society and an interesting technical problem. Of course, fresh inventory management offers even more opportunities to problem solve! We’ve posted about other challenges in tracking perishable blueberry inventory and accounting for censored demand.

The physical world presents a wealth of challenges; we witness perishable inventory, upstream supply chain shortages, and drastic responses to current events. At the same time, we sit on a trove of quantitative techniques and paradigms that can apply to these challenges; we might frame one situation with model predictive control and another with convex optimization. On Afresh’s Prediction, Optimization, and Planning (POP) team, we get to experiment with these possible solutions and decide which one suits the problem at hand, knowing it will directly address food waste and food access!

If this combination of problem solving and social impact appeals to you, please consider applying to our open roles!

Special thanks to Philip Cerles, Rachel Chen, Aaron Stern, Joan Creus Costa, and Volodymyr Kuleshov.

--

--

Andy Chen
Afresh Engineering

Machine Learning Engineer @ Afresh || Stanford Math + CS || He/Him