Estimating Cost of Delay

Estimating Cost of Delay

Today we’re going to talk cost of delay. This is a big one. There’s a lot to say. You can’t apply concepts you don’t have, and this is definitely one you should want in your toolbelt. The concept comes from Don Reinertsen, who argues, “If you only quantify one thing, quantify the cost of delay” (Reinertsen, 2009). If you’ve heard of if but don’t know what it is, or if you sort of know what it is but are wondering how to estimate it, or if you’re a Product Owner looking to up her prioritization game, this post is for you.

1. Cost vs. Cost of Delay

A good way to understand cost of delay is to compare it to traditional cost. “Cost” is how much money you pay to complete something. “Cost of delay” is how much money you lose (per unit of time) due to delay. Put another way, if an option (a feature or solution or outcome etc.) will create value, then a delay in achieving it necessarily creates a cost. Hence, cost of delay. Conversely, if an option has negative ROI, then delaying it (ideally forever) creates value. (In such cases there is a benefit of delay.)

Cost of delay typically exceeds traditional cost by many factors. (Often by more than a factor of 10.) Because of this, basing decisions on cost alone ends up leaving a lot of value behind. We largely minimize cost of delay and maximize value by making smarter bets, by capturing and challenging the assumptions being made and leveraging quick research to derisk them. (See my previous post.) If we don’t do this because we’re laser focused on traditional cost, we’re failing to minimize cost of delay and leaving more money on the table than need be.

Here’s an example adapted from Allen Holub. Say you have an Agile team of seven. The average employee burden rate is $185k per year, and there are 260 working days a year. A pretty standard “loading factor” for the space they occupy, the machines they use, HR support, etc. is x2. Your Agile team of seven then “costs” about $185k ÷ 260 x 2 x 7 = $9962 a day. Let’s say there are three options or possible work items on the table. You think this team could do any one of them in four weeks (20 days). The traditional “cost” of any one of the options then is the same. It = four weeks of the team’s time, which is about $200k.

The costs of delay, however, are shown below. What happens if you prioritize option X first? In terms of traditional cost, it’s the same as prioritizing option Y first — it’s still a matter of four weeks of the team’s time ($200k). But this ignores that by prioritizing X, you’re delaying Y and Z. That delay has an additional cost. Delaying Y by four weeks costs $8 million. Delaying Z four weeks costs $2m. By prioritizing X over Y and Z — even though their “costs” are the same — you’re now out $10 million. When the value being lost from delay is exposed, traditional cost typically becomes chump change fast.

Three ideas and their respective costs of delay

OK. What if these are three options (X, Y, and Z) are possible solutions to a problem and you’re only going to go with one of them? What if you estimate that option X can be completed in two weeks, while Y and Z would both take about three months? Focusing on traditional cost, you might reason that solution X only “costs” $100k, whereas solutions Y and Z would each cost about $600k, so going with solution X is a $500k “cost savings.” If the above figures show the actual costs of delay, you’ve just prioritized a one-time $500k cost savings while now losing an additional $1.99 million in value every week. Your boss should not be impressed.

The mistake you’re making is to focus on a one-time cost while ignoring the ongoing value lost. You may have heard the objection, however, that “the traditional cost is real. The cost of delay is just an opportunity cost.” Don’t let this fool you. The cost of delay is also real. It’s just typically hidden and therefore ignored. You don’t know what the actual cost of delay is beforehand (and maybe even after the fact), and you can’t calculate it, but it outweighs “cost” by so many factors that you’d be remiss not to focus on it.

The good news is that you don’t need to be able to precisely predict cost of delay in order to make better tradeoff decisions by using it. (We’ll take a look at why in the next section.) As cost of delay expert Joshua Arnold (see Arnold, 2015 for a nice intro) points out, the practice of just estimating cost of delay is a catalyst for important conversations that wouldn’t otherwise happen. This helps to surface people’s assumptions so that they can be challenged. It also shifts the conversation so the focus is less on cost and deadlines and more on value and speed.

2. Confidence Intervals and Calibration

Typically, the actual values for cost of delay for a set of options will be very far apart. If it seems like they aren’t, then a) you don’t need to prioritize by value in the first place; or b) you should ignore these options and search for one with a much higher cost of delay. When it seems they are, then estimating ranges — even wide ranges — is all you need to do. The goal is to estimate a range for each cost of delay that’s good enough to provide a rank ordering of compared options. You can then gauge your confidence and tweak your ranges with some calibration exercises. In this section we’ll take a look at how to do this.

Take a look at the questions below. You probably don’t know the answers to them, and that’s fine. You don’t need to know the exact answer to be able to state your guess as a 90% confidence interval (CI). This is your range of values you feel 90% confident will contain the right answer. The low end is the CI’s “lower limit” and the high end is the “upper limit.” Go ahead and state your limits for the questions below. No Googling. Just jot down a low and high value such that you are 90% confident they contain the answer to the question.

90% confidence interval exercise

Let’s look at the question about St. Ignatius. You wrote down two years, the lower and upper limit. These form your CI. Now let’s make a deal. You have a chance to win $1000, but you can only make one of the following bets. Would you rather bet your CI contains the right answer, or would you rather spin the arrow below? With the spinner, you have a 90% chance of winning $1000 and a 10% chance of winning nothing. Think about which option you would really prefer.

Would you rather bet on your 90% CI, or spin the spinner?

Done? OK. If you’re feeling you’d rather spin the arrow, that’s letting you know you are less than 90% confident in your CI. In other words, you were overconfident. If you would rather bet on your CI, then you are more than 90% confident, which means you were underconfident. If the lower and upper limits you wrote down were really your 90% CI, you would feel indifferent between betting on your CI and spinning the arrow. If we start changing the odds on the wheel and you find you’re indifferent between a 70/30 wheel and your CI, then the range you wrote down is your 70% CI. This exercise, by the way, is from Douglas Hubbard’s (2014) excellent book, How to Measure Anything. (You should read it.)

Let’s say you wrote down 1600 to 1750. We do the equivalence bet game above and you’d rather spin the arrow. That’s letting you know your upper and lower limits aren’t wide enough. Adjust your upper and lower limits until you’re indifferent between betting on your CI and spinning the arrow. Let’s say you settle on 1400 to 1800. That’s your 90% CI. What this is doing is helping to calibrate you. Someone is “calibrated” when she’s right 90% of the time she says she’s 90% sure. Improving your calibration makes you a better estimator. It helps you “overcome your confidence.” (And the good news is that doing this for pretend still improves your calibration. It doesn’t matter if real money is involved.) Most of us, by the way, are not well-calibrated. We’re overconfident. When we give our 90% CI, the actual value only falls within it about half the time.

This is partially due to the bias of anchoring and adjustment. We guess the actual value and then fail to veer very far from it in generating our upper and lower limits. As a result, our CI is too narrow. A helpful technique to avoid this is the “absurdity test,” a sort of reductio ad absurdum. The idea is to start with a CI that’s absurdly wide. If the question is, “When did Steinbeck write East of Eden?” an absurdly wide CI would be 1018 to 2018. (Example from Doyle, 2009.) Now, what are two pros and two cons for this range? (Willis, 2013) List them, tweak the range, and do another equivalence bet test. You should also “interrogate” each CI limit. With a 90% CI, notice you are claiming there is a 5% chance your lower limit is greater than the right answer (and that your upper limit is lower than the right answer).

90% CI: There’s a 10% chance the right answer is outside your limits

What do you feel the odds are Steinbeck wrote East of Eden before 1018? If you say “0%,” you need to move up your lower limit. Maybe you think of The Grapes of Wrath and realize he lived through the Great Depression, so you bring the lower limit up to 1880. Conversely, you can’t really think there’s a 5% chance he wrote it after 2018, so the upper limit clearly needs to be brought down. Maybe you then remember that the James Dean movie based on the book came out in the 1950s, so an upper limit of 1960 would be fine.

Now with cost of delay you’re not answering trivia questions about the lifespan of dolphins or guessing the year Cleopatra died, but you’re still estimating ranges for things you don’t know the actual value of. Estimating cost of delay is mostly about facilitating conversations about choices, framed in variety of ways, aimed at surfacing the assumptions being made. When we pretend to know the right things to build instead of focusing on options and their possible value we don’t get better at making smart bets. And that’s an expensive problem to have.

Incidentally, business people are typically not well-calibrated. This can be challenging with HiPPO prioritization (prioritization based on the “highest-paid person’s opinion”), as when you have a higher-up who likes to tell you what to build and doesn’t let you iteratively learn your way forward. This calibration idea is from Larry Maccherone. Even if they don’t accept your “deal,” regularly getting people to think through such questions can help improve their calibration.

Let’s put some money on this

3. Facilitated Conversations

Gather a group of people. Five people should be fine. As John Cutler puts it, the aim is to help surface assumptions about value and risk, build shared understanding, and stay focused on what you want to achieve. It can help to remind the group of two of Reinertsen’s main points:

· Intuition is not a good substitute for quantitative estimates.

· Quantitative estimates do not need to be all that accurate to be useful.

Take a look at the table below. There are three options, X1, 2, and 3. They each have a respective cost of delay, Y1, 2, and 3. You don’t know what the values for Y are, but as Douglas Hubbard would point out, just because you don’t know anything about Y does not mean you know nothing about X. After all, if you really knew nothing about the Xs, then you should not be considering them for prioritization in the first place.

The values for Y are unknown

The process here is basically to quickly gather what data you can, gather some people together with some knowledge about the option(s) being considered, and run the group through a facilitated conversation around choices. The options could be outcomes, solutions, etc. Hubbard’s (2014) three questions below are a good starting point.

· What do you mean by X?

· Why does X matter to you?

· What are you observing when you observe improved X?

You can then use people’s answers as your first inputs and go from there. To get people to provide better estimates requires, like many things, better facilitation. Whatever quick data gathering you can do to shore up your estimates, do it. But don’t go overboard. Hubbard (see also Muelhauser, 2013) makes the following points:

· Whatever you’re trying to estimate, it’s probably not as unique as you think.

· You probably have more data than you initially think.

· You actually don’t need as much data as you think.

· An adequate amount of data is far more accessible than you assume.

It probably requires less information than you think to create a useful CI range. In gathering additional data in attempt to inch your way toward “certainty” you will quickly run up against diminishing returns. The idea is “just enough” research. Interview some stakeholders. Interview some users. Talk to Finance. Make sure you do this before using the techniques below. (If projects are extant, ask Finance what they would pay to remove a month of cycle time from the project. That’s CoD per month. Such microeconomics, Reinertsen says, are far more important than go/no-go decisions.) Gather some people, either your team, or stakeholders, or SMEs, any individuals with some knowledge of the option(s) you’re considering, and start utilizing some of the techniques and exercises below.

Clean Language: This is a technique from psychotherapy, developed by David Grove. It is now widely used in coaching and research interviewing. Here we’ll just touch on the concept briefly to offer the idea. We will cover Clean Language in detail in a future post. Below is an adapted set of “clean questions.” You can think of them as questions that explore another person’s model and assumptions without adding your own model or assumptions into the mix.

Some “clean questions”

We largely communicate, and even make sense of the world, through metaphors. Listen for the metaphors people use in discussing ideas. Get in the habit of using cleanish questions to unpack and explore what people mean by the metaphors they use. This will surface both assumptions and desired outcomes. The assumptions should be explored and the outcomes should be further unpacked and refined. Probe with questions like, “And what kind of X is that X?” and “Is there anything else about X?” The responses will typically contain quite a few outcomes. Capture them. You can then shift to discuss the possible costs of delay for these different outcomes, asking things like, “And if X was achieved, what would people see or hear?” “That’s like what?” “Then what happens?” “And what else?” This encourages people to explore how various options might change the environment in value-adding ways, which plays straight into the concept of cost of delay.

Value Buckets: As the group discusses options, get them to focus on which value bucket(s) they would primarily fall in. In other words, how does X create value? If you can’t say, why would you prioritize it in the first place? If you feel that for much of your work you can’t gauge how much value it will provide, the bigger question should be why such work is getting prioritized over items that clearly have much higher cost of delay? (image adapted from Joshua Arnold)

What “value bucket” would the idea fall in?

The more you learn about this stuff the more you’ll realize there is no one “right” way to do it. Improve your facilitation skills, learn about design games and other structured activities, and start leveraging them in creative ways to improve your group discussions.

Ritual Dissent: Have everyone write down their 90% CI. Now have them write down three reasons why they chose the limits they did. Do some ritual dissent. A volunteer explains her reasons for one minute. The rest of the group cannot speak. At the end of the minute the volunteer then cannot speak as the group criticizes (dissent) her assumptions or suggests improvements (assent) for two minutes. It can help if the person whose turn it is turns around, or even puts on a mask(!). The idea is to do something physical that separates the person from her ideas, so she won’t take the criticism personally. Go through some calibration exercises. Does the person wish to change her CI limits? Do this as a round robin until everyone has gone.

Divide the Dollar: List Xs (whether you’re focusing on outcomes, solutions, etc.) in the left column shown below. Give everyone the same amount of Monopoly money and have them use it to “buy” the Xs. As they do so, have them jot down their reasons, one per sticky note. Tally how much the group spent on each X in the middle column. What are the winning items? The real value here comes in having people flesh out the Why column. After people freelist their reasons into this column, see if there are any clusters. What are the main reasons? What assumptions are being made? What happens when you challenge these assumptions?

Draw it on a whiteboard, populate with stickies

Time Travel: Use the questions below to facilitate the group through thinking about their assumptions about outcomes. Make sure to focus on “Why?” (Most of these questions are from Joshua Arnold.)

· If you could have this X today, what would that be worth to you (in actual dollars)? Why wouldn’t you pay more? (If you said “$0,” why are you considering X at all?) How would this create value? What else? And how might that create value?

· If you could have had this X a month ago, what would that be worth to you today? Why? How might that have created value? (Which value bucket[s] does it fall in?) What are some reasons why that figure might be higher than you think?

· If you couldn’t get X until a month from now, how much might that cost the business? In other words, how much value would be lost by delaying X by four weeks? What are some reasons why that figure might be higher?

· If you never get X, will you regret it? Why? Will you regret it six months from now? Why? Will you regret it in a year? Two years? Why? Why not? If you find you wouldn’t regret it, why are you considering this option?

· If the organization didn’t have X for another year, what would be the consequences? How much money might that lose the organization? (Notice dividing by 52 would give you a cost of delay per week.)

Drawing a Distribution: In discussing an outcome run through the following questions, keeping in mind that their precise wording is important. Use people’s answers to help them draw a distribution for Y (cost of delay). (This exercise is from the field of Decision Quality.)

· If you were to draw a distribution of guesses for this X’s cost of delay, what do you think the distribution’s median would be? (This is value for which half the distribution would be above it and half below it.)

· For this X’s cost of delay, what do you think a surprisingly low estimate would be, an estimate for which there is only a 5% chance the real cost of delay would be lower?

· What do you think a surprisingly high estimate would be, the estimate for which there is only a 5% chance the real cost of delay is higher?

· Do you still like your original estimate for the median? What are some reasons why it might be higher? Write down three reasons it might be higher.

· Draw your distribution. What do you think its shape would be? Is it positively or negatively skewed? Why?

· Do you think the real value is above or below your distribution’s median? Why?

Median with positively skewed distribution

As we saw in the discussion on calibration above, most people’s distributions will at first be too narrow. Explicitly getting them to think about “surprisingly low” and “surprisingly high” values helps to broaden the distribution to a more appropriate range.

CI Straw Poll: Capture people’s 90% CIs in a table. (The example below is from John Cutler.) As you walk the group through some of these (or other) exercises, you’ll typically find that people’s answers are very different. Reinertsen has found that a group’s initial cost of delay estimates (for the same option) typically differ by a factor of 50 to 1. In one workshop, the group’s initial estimates differed by a factor of more than 1000 to 1 (see Barrett, 2015).

Average individual estimates for more robust values

Focus the conversation on this range, leveraging it to elicit the different assumptions being made. Challenge assumptions, discuss differences of opinion, and then do another round of estimates. Try to get the difference for their limits down to a factor of at least 3 to 1. You can then average the lower limit and upper limit columns for a more robust CI. If you’d rather work with a single value, you can take the midway point between the resulting lower and upper limit.

After doing this for your different options (Xs), compare their respective CIs for Y (cost of delay). If an option’s 90% CI is, say, $20k to $50k, you can keep looking (Barrett, 2015). There’s a better option out there with a much higher cost of delay. The challenge is finding it. You won’t prioritize such an option above something with a 90% CI of $500k to $700k or $1m to $10m. Another way to say this is that if you’re fighting over a $10,000 difference, you’re not casting a wide enough net.

4. Duration and Slicing

Reinertsen (2009) uses cost of delay as the numerator of his WSJF (weighted shortest job first) equation. The denominator is duration — how long you estimate it will take to do something. In general, the higher the cost of delay and the shorter the duration, the more of a priority the item should be. (If the estimated costs of delay are equivalent, prioritize shorter durations first. If estimated durations are equivalent, prioritize higher costs of delay first.) (Note that SAFe has its own version of these concepts, which Reinertsen [2017] says doesn’t really gel with his approach. SAFe, he says, is like dipping your toe in the water. Hopefully you’ll keep learning and move on to some of these more advanced techniques.)

Here is an illustration of how ignoring cost of delay and duration (WSJF) wastes money. Take our example Agile team of seven from above. We had calculated that they “cost” $9962 per day, or about $50k for a five-day workweek. You’re looking at how three features in your backlog should be prioritized (see below). The total cost for building all three is $700k (14 weeks x $50k). The “cost” of building feature X is $150k, Y $300k, and Z $250k. If they were entered into the backlog in that order and you went with a FIFO (first in first out) prioritization, you would generate a total cost of delay of $32m.

First in first out prioritization vs. weighted shortest job first

If instead you went with a WSJF (weighted shortest job first) or CD3 (cost of delay ÷ duration) prioritization, the total cost of delay would be $17.6m. Thus, going with a FIFO over a WSJF prioritization would mean you’re out $14.4 million ($32m — $17.6m). Notice that the total traditional “cost” of all three ($700k) is chump change in comparison. You may have noticed that a HVF (highest value first) prioritization would have produced the same results here as the WSJF prioritization. While this is true in the above example, this is not always the case, which is why WSJF is to be preferred. In the example below, a HVF prioritization would generate a total cost of delay of $26.1m whereas WSJF would result in $22.1m (a difference of $4 million).

Highest value first prioritization vs. weighted shortest job first

OK, so throughout this post we’ve been talking about the cost of delay of options in a generic way, whether they be outcomes, solutions, what have you. This begs the question, is the implication that we should be looking at the cost of delay for all of these things? Arnold (2018) makes the following recommendations.

· Start with quantifying the cost of delay of outcomes. At this level you’re not dividing by duration. You’re not focusing on how the outcomes will be achieved, rather which outcomes are the most value-adding to go after. Ignoring this would take you into solution space prematurely.

· Allow teams to figure out how they contribute to one or more prioritized outcomes. Contributions may include experiments, feature sets, or even just small changes. (Don’t ignore “tiny wins,” those high cost of delay small changes that are typically only discovered in a bottom-up fashion.) POs should estimate what percent of the target outcome they think their contribution will achieve. This percent x the outcome’s cost of delay gives the team their numerator for their WSJF calculation (cost of delay ÷ duration).

· Don’t get overly fussy in estimating duration. For their outcome contributions, have teams use a rough T-shirt size of how long each option will likely block the pipeline. Use a number for Small, another for Medium, and one for Large. Take the cost of delay percent contribution and divide by numeric T-shirt size for a WSJF prioritization. This makes sense at the local level while still being clearly connected to higher-level outcomes. If something has a very high cost of delay and it’s clearly a Small, just do it! If stuff is floating around the Medium or Large categories, you can experiment and apply slicing to discover the minimal path to value. A faster, cheaper thing that achieves the same outcome creates more value.

Once teams know what outcomes they will contribute to, they can think of their work in terms of “missions” and start running experiments to discover the quickest/cheapest way to achieve the outcome. This helps avoid diving straight into solution space. This is a mistake made by teams that primarily “take requests.” They forget their Design Thinking 101: Just because something is being requested does not mean users will use it if it’s built.

The underlying problem that needs solving is not the lack of what’s being requested. Requests need to be unpacked. You’re trying to achieve outcomes for users and the business. This is not best done by acting like you’re running a drive-thru window. Whatever your first, best guess is about what to build or do, park it, unpack it, challenge assumptions, and explore several other ways of solving the underlying problem.

What are other possible ways to achieve the desired outcome?

Leverage quick design research to enrich your model of the problem space, opening up additional value-adding options for contributing to outcomes. As stated above, the most value-adding options will be those that contribute the greatest percent to the target outcome in the shortest time (smallest duration). This is also helped by slicing existing options into something quicker/cheaper that still achieves the same percent of the outcome.

As we discussed last time, you’re looking for “the minimal path to value.” Agile coach Neil Killick recommends that teams explore three different kinds of slicing: capability, functional, and implementation. A capability allows users to do something they can’t currently do. Capability slicing then can be thought of as trimming down the user needs you’re focusing on. Functional slicing is about trimming down to the cheapest/quickest way to test your idea. Implementation slicing is what teams usually focus on, which is trimming down what they do to implement the functional slice.

Estimating duration and/or applying slicing at the level of experiments, solutions, and features helps achieve outcomes faster, which minimizes total cost of delay. While duration is estimated at the lower level of experiments and features, note that it is ultimately tied to the higher-level outcome. After all, the value is delivered when (and thus delayed until) the outcome is achieved, and not before. Expressing the cost of delay of features as the percent of the outcome they achieve represents this fact nicely. To take an example from John Cutler, say you have an outcome with a cost of delay of $10m a month. You run 10 experiments that month and the 10th experiment achieves the outcome. The experiments only have a cost of delay in relation to the outcome that is not getting achieved. Their value comes in discovering faster ways to achieve it.

As with many things, you need to map it out. Move the group’s thinking from their heads out into the world. Capture it in a way you can all see. Below are a couple templates to help paint the big picture. The first is from John Cutler (2016). The second is adapted from Barrett (2015).

Users → Outcomes → Impacts
CD3 estimation canvas

To close, as Arnold (2018) emphasizes, the real benefit is that this starts to shift the overall focus:

Evolve the focus

References

Arnold, J. (2018). Single prioritised backlog — chat with John Cutler. Black Swan Farming. Retrieved on July 9, 2018 from: http://blackswanfarming.com/single-prioritised-backlog-chat-with-john-cutler/.

Arnold, J. (2015). How to quantify Cost of Delay. Leankit. Retrieved on February 15, 2017 from: https://leankit.com/blog/2015/11/how-to-quantify-cost-of-delay/.

Barrett, S. (2015). Quantifying the Cost of Delay. Presented at Agile New England: Waltham, MA.

Cutler, J. (2016). 5 simple questions to drive validated learning. Medium. Retrieved on June 12, 2018 from: https://medium.com/@johnpcutler/4-simple-questions-to-drive-validated-learning-548a51a70ee5.

Doyle, J. (2009). Confidence levels and calibration. NetworkWorld. Retrieved on June 15, 2018 from: https://www.networkworld.com/article/2235191/cisco-subnet/confidence-levels-and-calibration.html.

Hubbard, D. W. (2014). How to measure anything: Finding the value of intangibles in business (3rd ed.). Hoboken, NJ: John Wiley & Sons, Inc.

Muelhauser, L. (2013). How to measure anything. LESSWRONG. Retrieved on June 15, 2018 from: https://www.lesswrong.com/posts/ybYBCK9D7MZCcdArB/how-to-measure-anything.

Reinertsen, D. (2017). Don Reinertsen interview. Scaled Agile, Inc. Retrieved on May 7, 2018 from: https://vimeo.com/247341782/09d528322e.

Reinertsen, D. G. (2009). The principles of product development flow: Second generation Lean product development. Redondo Beach, CA: Celeritas Publishing.

Willis, J. (2013). Estimation — Part I: How do I do it? 80,000 Hours. Retrieved on June 18, 2018 from: https://80000hours.org/2013/05/estimation-part-i-how-to-do-it/.